- class returnn.torch.distributed.DistributedContext(options: Dict[str, Any])#
This class setups some helper functions for torch distributed training
- local_rank() int #
- local_size() int #
- rank() int #
- size() int #
It depends on the specific setup what to return here, how CUDA_VISIBLE_DEVICES is set up, etc. This is currently a reasonable assumption, but we might extend the logic later, or make it configurable.
torch.distributed does not seem to provide a function for this. Via mpirun (OpenMPI), this env variable would be set. It should fail with an error otherwise.