returnn.tf.horovod
¶
Here we encapsulate some common Horovod functions.
Note that you are supposed to be able to import this module even if Horovod is not installed.
The usage of this module / global context is also considered optional at this point.
Horovod is enabled <==> use_horovod
is enabled in the config.
For relevant further config options, see the code of HorovodContext
below.
Most importantly:
horovod_dataset_distribution
, recommended value"random_seed_offset"
, default value"shard"
horovod_reduce_type
, recommended value"param"
, default value"grad"
horovod_param_sync_step
, recommended value100
, default value1
horovod_param_sync_time_diff
, alternative tohorovod_param_sync_step
, e.g.100.
(secs), defaultNone
Also see multi_gpu.
Also see TFDistributed
.
- class returnn.tf.horovod.HorovodContext(config)[source]¶
This setups some helper functions.
- Parameters:
config (Config)
- returnn.tf.horovod.get_ctx(config=None)[source]¶
- Parameters:
config (Config|None)
- Returns:
the global context if Horovod is enabled, or None otherwise. If we did not setup the context yet, it will automatically create it.
- Return type:
HorovodContext|None