Normalization Layers¶
Generic Normalization Layer¶
- class returnn.tf.layers.basic.NormLayer(axis=<class 'returnn.util.basic.NotSpecified'>, axes=<class 'returnn.util.basic.NotSpecified'>, param_shape=<class 'returnn.util.basic.NotSpecified'>, scale=True, bias=True, epsilon=1e-06, **kwargs)[source]¶
Normalize over specified axes, e.g. time and/or feature axis.
Note: For calculating a norm, see
MathNormLayer
instead.In case of just feature (
axes="F"
), this corresponds to layer normalization (seeLayerNormLayer
). In case of time and feature (axes="TF"
) for a 3D input, or more general all except batch (axes="except_batch"
), this corresponds to group normalization with G=1, or non-standard layer normalization. (The definition of layer-normalization is not clear on what axes should be normalized over. In many other frameworks, the default axis is just the last axis, which is usually the feature axis. However, in certain implementations and models, it is also common to normalize over all axes except batch.)The statistics are calculated just on the input. There are no running statistics (in contrast to batch normalization, see
BatchNormLayer
).For some discussion on the definition of layer-norm vs group-norm, also see here and here.
- Parameters:
axis (Dim|str|list[Dim|str]) – axis or axes over which the mean and variance are computed, e.g. “F” or “TF”
axes (Dim|str|list[Dim|str]) – axis or axes over which the mean and variance are computed, e.g. “F” or “TF”
param_shape (Dim|str|list[Dim|str]|tuple[Dim|str]) – shape of the scale and bias parameters. You can also refer to (static) axes of the input, such as the feature-dim. This is also the default, i.e. a param-shape of [F], independent of the axes to normalize over.
scale (bool) – add trainable scale parameters
bias (bool) – add trainable bias parameters
epsilon (float) – epsilon for numerical stability
- classmethod get_out_data_from_opts(sources, name, **kwargs)[source]¶
- Parameters:
sources (list[LayerBase])
name (str)
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]¶
- search_choices: Optional[SearchChoices][source]¶
Batch-Normalization Layer¶
- class returnn.tf.layers.basic.BatchNormLayer(in_dim=None, use_shift=<class 'returnn.util.basic.NotSpecified'>, use_std=<class 'returnn.util.basic.NotSpecified'>, use_sample=<class 'returnn.util.basic.NotSpecified'>, force_sample=<class 'returnn.util.basic.NotSpecified'>, momentum=<class 'returnn.util.basic.NotSpecified'>, epsilon=<class 'returnn.util.basic.NotSpecified'>, update_sample_only_in_training=<class 'returnn.util.basic.NotSpecified'>, delay_sample_update=<class 'returnn.util.basic.NotSpecified'>, param_version=<class 'returnn.util.basic.NotSpecified'>, gamma_init=<class 'returnn.util.basic.NotSpecified'>, beta_init=<class 'returnn.util.basic.NotSpecified'>, masked_time=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]¶
Implements batch-normalization (https://arxiv.org/abs/1502.03167) as a separate layer.
Also see
NormLayer
.- Parameters:
in_dim (returnn.tensor.Dim|None)
use_shift (bool)
use_std (bool)
use_sample (float) – defaults to 0.0 which is used in training
force_sample (bool) – even in eval, use the use_sample factor
momentum (float) – for the running average of sample_mean and sample_std
update_sample_only_in_training (bool)
delay_sample_update (bool)
param_version (int) – 0 or 1 or 2
epsilon (float)
gamma_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the scalebeta_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the meanmasked_time (bool) – flatten and mask input tensor
The default settings for these variables are set in the function
batch_norm()
ofLayerBase
. If you do not want to change them you can leave them undefined here. With our default settings:In training: use_sample=0, i.e. not using running average, using current batch mean/var.
Not in training (e.g. eval): use_sample=1, i.e. using running average, not using current batch mean/var.
The running average includes the statistics of the current batch.
The running average is also updated when not training.
- output_before_activation: Optional[OutputWithActivation][source]¶
- search_choices: Optional[SearchChoices][source]¶
Layer-Normalization Layer¶
- class returnn.tf.layers.basic.LayerNormLayer(in_dim=None, out_dim=None, epsilon=1e-06, **kwargs)[source]¶
Applies layer-normalization.
Note that we just normalize over the feature-dim axis here. This is consistent to the default behavior of
tf.keras.layers.LayerNormalization
and also how it is commonly used in many models, including Transformer.However, there are cases where it would be common to normalize over all axes except batch-dim, or all axes except batch and time. For a more generic variant, see
NormLayer
.- Parameters:
- classmethod get_out_data_from_opts(sources, name, **kwargs)[source]¶
- Parameters:
sources (list[LayerBase])
name (str)
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]¶
- search_choices: Optional[SearchChoices][source]¶