Abstract. The rapid development of machine learning reached the point where researchers and developers start to be limited by resources of a single machine and desire to utilize whole clusters. To address this issue, the authors have created a distributed computing package for the PyTorch framework. Our package aims for extensibility and attempts to strike the balance between performance and the ease of use. To accomplish that, it can operate in two modes. The first one adopts an interface identical to that of the PyTorch CUDA package, achieving a gentle learning curve, but sacrificing scalability. The other mode is based on MPI and is designed for experienced engineers, who wish to develop custom solutions to achieve a peak performance. Core logic is implemented as a C++ library and is exposed to Python through a thin wrapper around its C interface. It has no external dependencies, which makes it easy to build in a variety of environments while a modular architecture allows users to swap and adjust functionalities to their needs.
To learn more about PyTorch, please click here.