returnn.frontend.parameter
¶
Parameter / variable
- class returnn.frontend.parameter.Parameter(dims_or_tensor: None | Sequence[Dim] | Tensor = None, dtype: str | None = None, *, dims: Sequence[Dim] | None = None, sparse_dim: Dim | None = None, trainable: bool | None = None, auxiliary: bool = False, non_critical_for_restore: bool = False, weight_decay: float | None = 0.0, initial: Tensor | int | float | complex | number | ndarray | bool | str | ParamInit | None = None, raw_tensor: T | None = None, device: str | None = None)[source]¶
This represents a (potential trainable) parameter, aka
tf.Variable
in TensorFlow, wrapping toVariableLayer
in RETURNN.- Parameters:
dims_or_tensor
dims
dtype
sparse_dim
trainable – if True, and optimizer would do updates to this parameter in training mode
auxiliary – if True, this indicates that this parameter should not be transformed by transformations such as weight normalization. One example are running statistics, as used for batch normalization. This usually implies that the parameter is not trainable, i.e. not to be updated by the optimizer, but usually has some custom update. This flag is not passed on to RETURNN but just used here for returnn-common logic.
non_critical_for_restore – if True, this parameter is not critical for restoring a model.
weight_decay
initial
raw_tensor
device
- property initial: Tensor | int | float | complex | number | ndarray | bool | str | ParamInit | None[source]¶
initial value of the parameter
- assign(value: Tensor | int | float | complex | number | ndarray | bool | str)[source]¶
Assign new value to this parameter. This will also update the allocated raw tensor inplace.
For graph-based backends, handling the control flow is up to the backend, e.g.~making sure it is being executed in the right order, in the right control flow context, and at all. There is no op or anything like that returned here which the user needs to take care of. So the user can think of it just as imperative eager-style code.
- assign_add(value: Tensor | int | float | complex | number | ndarray | bool | str)[source]¶
Add value to this parameter. This will also update the raw tensor. See
assign()
.
- assign_key(axis: Dim | Sequence[Dim], key: int | float | complex | number | ndarray | bool | str | Tensor | slice | Sequence[int | float | complex | number | ndarray | bool | str | Tensor | slice], key_dim: Dim | Sequence[None | Dim] | None, value: Tensor | int | float | complex | number | ndarray | bool | str)[source]¶
Basically var[key] = value, if axis is the first axis, or otherwise accordingly. Note that the __setitem__ API is not supported because it depends on the order of axes, but this here is the equivalent function. See
assign()
.
- to(*, device: str | None = None, dtype: str | None = None)[source]¶
Move the parameter to the specified device, and/or the specified dtype.
Note: This is an inplace operation. raw_tensor might point to a new raw tensor (or parameter) afterward.
- property weight_decay: float[source]¶
Weight decay, which is equivalent to L2 loss on the parameters for SGD. On RETURNN side, whether this is handled separately or is part of the main loss, can be controlled via the
decouple_constraints
config option. https://github.com/rwth-i6/returnn_common/issues/59#issuecomment-1073913421