returnn.tf.horovod¶
Here we encapsulate some common Horovod functions.
Note that you are supposed to be able to import this module even if Horovod is not installed.
The usage of this module / global context is also considered optional at this point.
Horovod is enabled <==> use_horovod is enabled in the config.
For relevant further config options, see the code of HorovodContext below.
Most importantly:
horovod_dataset_distribution, recommended value"random_seed_offset", default value"shard"horovod_reduce_type, recommended value"param", default value"grad"horovod_param_sync_step, recommended value100, default value1horovod_param_sync_time_diff, alternative tohorovod_param_sync_step, e.g.100.(secs), defaultNone
Also see multi_gpu.
Also see TFDistributed.
- class returnn.tf.horovod.HorovodContext(config)[source]¶
This setups some helper functions.
- Parameters:
config (Config)
- returnn.tf.horovod.get_ctx(config=None)[source]¶
- Parameters:
config (Config|None)
- Returns:
the global context if Horovod is enabled, or None otherwise. If we did not setup the context yet, it will automatically create it.
- Return type:
HorovodContext|None