returnn.frontend.parameter

Parameter / variable

class returnn.frontend.parameter.Parameter(dims: Sequence[Dim], dtype: str | None = None, *, sparse_dim: Dim | None = None, trainable: bool | None = None, auxiliary: bool = False, non_critical_for_restore: bool = False, weight_decay: float | None = 0.0, initial: Tensor | int | float | complex | number | ndarray | bool | str | ParamInit | None = None, raw_tensor: T | None = None, device: str | None = None)[source]

This represents a (potential trainable) parameter, aka tf.Variable in TensorFlow, wrapping to VariableLayer in RETURNN.

Parameters:
  • dims

  • dtype

  • sparse_dim

  • trainable – if True, and optimizer would do updates to this parameter in training mode

  • auxiliary – if True, this indicates that this parameter should not be transformed by transformations such as weight normalization. One example are running statistics, as used for batch normalization. This usually implies that the parameter is not trainable, i.e. not to be updated by the optimizer, but usually has some custom update. This flag is not passed on to RETURNN but just used here for returnn-common logic.

  • non_critical_for_restore – if True, this parameter is not critical for restoring a model.

  • weight_decay

  • initial

  • raw_tensor

  • device

property initial: Tensor | int | float | complex | number | ndarray | bool | str | ParamInit | None[source]

initial value of the parameter

assign(value: Tensor | int | float | complex | number | ndarray | bool | str)[source]

Assign new value to this parameter. This will also update the allocated raw tensor inplace.

For graph-based backends, handling the control flow is up to the backend, e.g.~making sure it is being executed in the right order, in the right control flow context, and at all. There is no op or anything like that returned here which the user needs to take care of. So the user can think of it just as imperative eager-style code.

assign_add(value: Tensor | int | float | complex | number | ndarray | bool | str)[source]

Add value to this parameter. This will also update the raw tensor. See assign().

assign_key(axis: Dim | Sequence[Dim], key: int | float | complex | number | ndarray | bool | str | Tensor | slice | Sequence[int | float | complex | number | ndarray | bool | str | Tensor | slice], key_dim: Dim | Sequence[None | Dim] | None, value: Tensor | int | float | complex | number | ndarray | bool | str)[source]

Basically var[key] = value, if axis is the first axis, or otherwise accordingly. Note that the __setitem__ API is not supported because it depends on the order of axes, but this here is the equivalent function. See assign().

to(*, device: str | None = None, dtype: str | None = None)[source]

Move the parameter to the specified device, and/or the specified dtype.

Note: This is an inplace operation. raw_tensor might point to a new raw tensor (or parameter) afterward.

name: str[source]
dtype: str[source]
sparse_dim: Dim | None[source]
version: int[source]
property weight_decay: float[source]

Weight decay, which is equivalent to L2 loss on the parameters for SGD. On RETURNN side, whether this is handled separately or is part of the main loss, can be controlled via the decouple_constraints config option. https://github.com/rwth-i6/returnn_common/issues/59#issuecomment-1073913421

property trainable: bool | None[source]
property auxiliary: bool[source]
property non_critical_for_restore: bool[source]