returnn.torch.updater
¶
This module covers the optimizer (SGD, Adam, etc) logic, and model param update logic in general.
- returnn.torch.updater.get_optimizer_class(class_name: str | Type[Optimizer] | Callable[[], Type[Optimizer]]) Type[Optimizer] [source]¶
- Parameters:
class_name – Optimizer class, either as str (e.g. “adam”), as type (torch.optim.Adam) or callable. If str, we support all torch.optim optimizers (ignoring case) (e.g. “adam”), or class names with full module path (e.g. “returnn.torch.optim.lion.Lion”).
- Returns:
Optimizer class, e.g. torch.optim.Adam
- class returnn.torch.updater.Updater(*, config, network, device, initial_learning_rate=1.0)[source]¶
Wraps a torch.optim.Optimizer, and extends it by some further functionality.
- Parameters:
config (returnn.config.Config) – config defining the training conditions.
network (torch.nn.Module) – PyTorch Module defining the network.
device (torch.device|str)
initial_learning_rate (float)
- set_learning_rate(value)[source]¶
Updates the learning rate of the optimizer at each (sub)epoch.
- Parameters:
value (float) – New learning rate.
- set_current_train_step(*, global_train_step: int, epoch: int, epoch_continuous: float | None = None)[source]¶
Obtains an updated learning rate for the current training step inside a (sub)epoch.
- Parameters:
global_train_step – Current global training step over the whole training process. In the first epoch, this starts at 0.
epoch – Current epoch. (First epoch is 1 by RETURNN convention.)
epoch_continuous – How much of the epoch is finished. In the first step of the first epoch, this starts at 0.0, and when the fist epoch is finished, this reaches 1.0, and the values in between are the fraction of the epoch that is finished. The second epoch (epoch=2) starts at 1.0, and when the second epoch is finished, this reaches 2.0, and so on. We usually calculate this based on
epoch-1+(last_seq_idx+1)/num_seqs
, if the dataset can providenum_seqs
. Other schemes based on the step_idx might be used as well to calculate this, if the number of steps per epoch is known in advance.
- step(*, grad_scaler: GradScaler | None = None)[source]¶
Perform one step, i.e. update the parameters using the optimizer given the current calculated gradients.
- load_optimizer(filename)[source]¶
Loads a torch.optim.Optimizer from disk and stores it in self.optimizer.
- Parameters:
filename (str) – File from which to load the optimizer state.