returnn.torch.updater

This module covers the optimizer (SGD, Adam, etc) logic, and model param update logic in general.

returnn.torch.updater.get_optimizer_class(class_name: str | Type[Optimizer] | Callable[[], Type[Optimizer]]) Type[Optimizer][source]
Parameters:

class_name – Optimizer class, either as str (e.g. “adam”), as type (torch.optim.Adam) or callable. If str, we support all torch.optim optimizers (ignoring case) (e.g. “adam”), or class names with full module path (e.g. “returnn.torch.optim.lion.Lion”).

Returns:

Optimizer class, e.g. torch.optim.Adam

class returnn.torch.updater.Updater(*, config, network, device, initial_learning_rate=1.0)[source]

Wraps a torch.optim.Optimizer, and extends it by some further functionality.

Parameters:
  • config (returnn.config.Config) – config defining the training conditions.

  • network (torch.nn.Module) – PyTorch Module defining the network.

  • device (torch.device|str)

  • initial_learning_rate (float)

set_learning_rate(value)[source]

Updates the learning rate of the optimizer at each (sub)epoch.

Parameters:

value (float) – New learning rate.

get_effective_learning_rate() float[source]
Returns:

get the actual learning rate

set_current_train_step(*, global_train_step: int, epoch: int, epoch_continuous: float | None = None)[source]

Obtains an updated learning rate for the current training step inside a (sub)epoch.

Parameters:
  • global_train_step – Current global training step over the whole training process. In the first epoch, this starts at 0.

  • epoch – Current epoch. (First epoch is 1 by RETURNN convention.)

  • epoch_continuous – How much of the epoch is finished. In the first step of the first epoch, this starts at 0.0, and when the fist epoch is finished, this reaches 1.0, and the values in between are the fraction of the epoch that is finished. The second epoch (epoch=2) starts at 1.0, and when the second epoch is finished, this reaches 2.0, and so on. We usually calculate this based on epoch-1+(last_seq_idx+1)/num_seqs, if the dataset can provide num_seqs. Other schemes based on the step_idx might be used as well to calculate this, if the number of steps per epoch is known in advance.

step(*, grad_scaler: GradScaler | None = None)[source]

Perform one step, i.e. update the parameters using the optimizer given the current calculated gradients.

create_optimizer()[source]

Creates an optimizer and stores it in self.optimizer.

load_optimizer(filename)[source]

Loads a torch.optim.Optimizer from disk and stores it in self.optimizer.

Parameters:

filename (str) – File from which to load the optimizer state.

save_optimizer(filename)[source]

Saves the state of self.optimizer to a file.

Parameters:

filename (str) – File in which to save the optimizer state.

get_optimizer()[source]
Returns:

Wrapped optimizer object.

Return type:

torch.optim.Optimizer

returnn.torch.updater.gradient_noise_(params: Iterable[Parameter], std: float)[source]

Add gradient noise to parameters, using a truncated normal distribution.