Basic Layers#

Accumulate Mean Layer#

class returnn.tf.layers.basic.AccumulateMeanLayer(exp_average, axes='bt', initial_value=None, is_prob_distribution=None, **kwargs)[source]#

Accumulates the mean of the input (in training) (over batch-dim and time-dim by default). It’s similar to ReduceLayer

Parameters:
  • exp_average (float) – momentum in exponential average calculation

  • axes (int|list[str]|str) – the axes to reduce. must contain batch and time.

  • initial_value (float) – how to initialize the variable which accumulates the mean

  • is_prob_distribution (bool) – if provided, better default for initial_value

layer_class: Optional[str] = 'accumulate_mean'[source]#
classmethod get_out_data_from_opts(axes='bt', **kwargs)[source]#
Parameters:

axes (str) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Activation Layer#

class returnn.tf.layers.basic.ActivationLayer(activation, opts=None, **kwargs)[source]#

This layer just applies an activation function. See returnn.tf.util.basic.get_activation_function() about supported functions. Also see EvalLayer and CombineLayer for similar layers.

Parameters:
  • activation (str) – e.g. “relu”, “tanh”, etc

  • opts (dict[str]|None) – for activation function, e.g. eps for safe_log

layer_class: Optional[str] = 'activation'[source]#
output_before_activation: Optional[OutputWithActivation][source]#
classmethod get_out_data_from_opts(activation, **kwargs)[source]#
Parameters:

activation (str) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Combine Layer#

class returnn.tf.layers.basic.CombineLayer(kind, sources, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, activation=None, with_bias=False, eval=None, eval_locals=None, eval_for_output_loss=False, **kwargs)[source]#

Applies a binary operation, such as addition, to all sources while accumulating the partial results. In the first step, the binary operation is performed on the first two sources. After the first step, the previous results is always the left-hand operator.

Its basic working is similar to the reduce function used in functional programming. Also see ActivationLayer, or CompareLayer.

Parameters:
  • kind (str) – currently accepted values are average, add, sub, mul, truediv, floordiv, mod, pow, maximum, minimum, logical_and, logical_or, squared_difference, or eval, or any function in the tf.math or tf namespace.

  • sources (list[LayerBase]) –

  • allow_broadcast_all_sources (bool|NotSpecified) – allow broadcasting for all sources. e.g. shape [A] + [B] -> shape [A,B]. by default disabled, and there must be some source with all dims.

  • activation (str|None) – if provided, activation function to apply, e.g. “tanh” or “relu”

  • with_bias (bool) – if given, will add a trainable bias tensor

  • eval (str|callable) – for kind=”eval”, will eval this string. or function. see _op_kind_eval()

  • eval_locals (dict[str]|None) – locals for eval

  • eval_for_output_loss (bool) – will do the same eval on layer.output_loss

layer_class: Optional[str] = 'combine'[source]#
recurrent = True[source]#
output_loss: Optional[tf.Tensor][source]#
output_before_activation: Optional[OutputWithActivation][source]#
classmethod get_out_data_from_opts(network, sources, eval_locals=None, n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, out_shape=None, **kwargs)[source]#
Parameters:
  • network (returnn.tf.network.TFNetwork) –

  • sources (list[LayerBase]) –

  • eval_locals (dict[str]|None) – locals for eval, will also pass to out_type is out_type is a function

  • n_out (int|None|NotSpecified) –

  • allow_broadcast_all_sources (bool|NotSpecified) –

  • out_type (dict[str]|None|(()->Data)) –

  • out_shape (set[Dim|_MarkedDim]|tuple|list|None) – verifies the output shape (dim tags)

Return type:

Data

kwargs: Optional[Dict[str]][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#

Compare Layer#

class returnn.tf.layers.basic.CompareLayer(kind='equal', value=None, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#

Compares element-wise the tokens of all input sequences among themselves and/or with a specified given value. The comparisons are performed in a chain according to the order in which they are listed.

Example:

{"class": "compare", "from": ["i1", "i2"], "value": val, "kind": "less"}

computes i1 < i2 < val and it is true only if the whole chain of operations is true. The final result is the logical “and” of all comparisons. Note that value is the last element to be compared to.

A common example usage is the end layer in a rec subnetwork to specify the stopping criterion, e.g. the last generated token is equal to the end-of-sentence token:

"output": {"class": "rec", "from": [], "unit": {
    .
    .
    .
    "end": {"class": "compare", "from": "output", "value": end_of_sentence_id}
}, "target": "classes0"}
Parameters:
  • kind (str) – which comparison operation to use, e.g. “equal”, “greater”, “less” or other supported TF comparison ops

  • value (float|int|None) – if specified, will also compare to this

  • allow_broadcast_all_sources (bool|NotSpecified) – allow broadcasting for all sources. e.g. shape [A] + [B] -> shape [A,B]. by default disabled, and there must be some source with all dims.

layer_class: Optional[str] = 'compare'[source]#
classmethod get_out_data_from_opts(sources, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, out_shape=None, **kwargs)[source]#
Parameters:
  • sources (list[LayerBase]) –

  • allow_broadcast_all_sources (bool|NotSpecified) –

  • n_out (int|None|NotSpecified) –

  • out_type (dict[str]|None) –

  • out_shape (dict[str]|None) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#

Constant Layer#

class returnn.tf.layers.basic.ConstantLayer(sources, value=0.0, shape=None, dtype=None, with_batch_dim=False, sparse_dim=None, feature_dim=None, shape_deps=(), **kwargs)[source]#

Output is a constant value.

Parameters:
  • sources (list[LayerBase]) –

  • value (int|float|bool|numpy.ndarray) –

  • shape (tuple[Dim|int]|list[Dim|int]) – for verification, and defining dim tags

  • dtype (str|None) –

  • with_batch_dim (bool) –

  • sparse_dim (Dim|None) –

  • feature_dim (Dim|None) –

  • shape_deps (list[LayerBase]) – for dyn dim tags in shape

get_dep_layers()[source]#
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]#
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork) –

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

classmethod get_out_data_from_opts(name, value=0.0, shape=None, dtype=None, with_batch_dim=False, sparse_dim=None, feature_dim=<class 'returnn.util.basic.NotSpecified'>, shape_deps=(), **kwargs)[source]#
Parameters:
  • name (str) –

  • value (int|float|bool) –

  • shape (tuple[Dim|int]|list[Dim|int]) – for verification, and defining dim tags

  • dtype (str|None) –

  • with_batch_dim (bool) –

  • sparse_dim (Dim|None) –

  • feature_dim (Dim|None|NotSpecified) –

  • shape_deps (list[LayerBase]) – for dyn dim tags in shape

Return type:

Data

Convolution Layer#

class returnn.tf.layers.basic.ConvLayer(filter_size, padding, strides=1, dilation_rate=1, groups=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, in_dim=None, in_spatial_dims=None, n_out=None, out_dim=None, out_spatial_dims=None, auto_use_channel_first=<class 'returnn.util.basic.NotSpecified'>, with_bias=<class 'returnn.util.basic.NotSpecified'>, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, filter=None, filter_perm=None, bias=None, use_time_mask=False, pad_seq_len_to_power=None, **kwargs)[source]#

A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.

Parameters:
  • filter_size (Sequence[Dim]|Sequence[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. The input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. It will automatically swap the batch-dim to the first axis of the input data.

  • padding (str) – “same”, “valid” or “same_static”. “same_static” is calculated differently depending on whether an axis is static or dynamic. For static axes, “same_static” padding is the same as “same” padding, i.e. filter_size - 1 - (T + strides - 1) % strides. For dynamic axes, “same_static” calculates the total padding size as filter_size - 1, i.e. it is independent of the length T of the axis and the striding. For dynamic axes, to avoid skipping any frames on the right, we set left_padding = (filter_size - strides) // 2.

  • strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.

  • dilation_rate (int|Sequence[int]) – dilation for the spatial dims

  • groups (int) – grouped convolution

  • in_dim (Dim|None) –

  • in_spatial_dims (Sequence[Dim|str]|None) –

  • n_out (int|None) – number of outgoing features

  • out_dim (Dim|None) –

  • out_spatial_dims (Sequence[Dim]|None) –

  • input_expand_dims (int) – number of spatial dims to add to the input

  • input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.

  • input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.

  • auto_use_channel_first (bool|NotSpecified) – convert the input to NCHW or not

  • with_bias (bool|NotSpecified) – if True, will add a bias to the output features. True by default since behavior version 10.

  • activation (None|str) – if set, will apply this function at the end

  • filter (LayerBase|None) – if given, will not create an own parameter, but use this as the filter

  • filter_perm (dict[str,str]|None) – transposes the filter (input filter as layer)

  • bias (LayerBase|None) – if given, will not create an own parameter, but use this as the bias

  • use_time_mask (bool) –

  • pad_seq_len_to_power (Optional[float]) – pad sequence length to power of given number to reduce number of different sequence lengths. See https://github.com/rwth-i6/returnn/issues/1450 and https://github.com/tensorflow/tensorflow/issues/62441.

layer_class: Optional[str] = 'conv'[source]#
recurrent = True[source]#
output_before_activation: Optional[OutputWithActivation][source]#
classmethod set_output_dim_tags(output, num_batch_dims, in_spatial_dims, out_spatial_dims, filter_size, strides, dilation_rate, padding)[source]#
Parameters:
  • output (Data) –

  • num_batch_dims (int) –

  • in_spatial_dims (Sequence[Dim]) –

  • out_spatial_dims (Sequence[Dim]|None) –

  • filter_size (Sequence[int|Dim]) –

  • strides (Sequence[int]) –

  • dilation_rate (Sequence[int]) –

  • padding (str) –

classmethod transform_input(input_data, network, in_dim=None, in_spatial_dims=None, input_expand_dims=0, input_split_feature_dim=None, input_add_feature_dim=False, use_time_mask=False)[source]#
Parameters:
  • input_data (Data) –

  • network (returnn.tf.network.TFNetwork) –

  • in_dim (Dim|None) –

  • in_spatial_dims (list[Dim|str]|None) –

  • input_expand_dims (int) – number of spatial dims to add to the input

  • input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.

  • input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.

  • use_time_mask (bool) –

Returns:

(transformed input, num batch dims). all batch dims are at the front

Return type:

(Data, int)

classmethod get_input_placeholder_with_same_static_padding(input_data: Tensor, num_batch_dims: int, filter_size: Sequence[int], strides: Sequence[int], out_batch_feature_major: bool) Tensor[source]#

Returns the placeholder of input_data with same_static padding applied to it.

Parameters:
  • input_data

  • num_batch_dims

  • filter_size

  • strides

  • out_batch_feature_major

classmethod calc_out_dim(in_dim, filter_size, stride, padding, dilation_rate=1)[source]#
Parameters:
  • in_dim (T|int|tf.Tensor|Dim) – dimension in some axis

  • filter_size (int|Dim) – e.g. 2, for the corresponding axis

  • stride (int) – e.g. 1, for the corresponding axis

  • dilation_rate (int) – e.g. 1

  • padding (str) – “valid” or “same”

Returns:

the output dimension

Return type:

T

classmethod get_out_data_from_opts(name, sources, network, filter_size, padding, strides=1, dilation_rate=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, in_dim=None, in_spatial_dims=None, n_out=None, out_dim=None, out_spatial_dims=None, auto_use_channel_first=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (Sequence[LayerBase]) –

  • network (returnn.tf.network.TFNetwork) –

  • filter_size (Sequence[int|Dim]) –

  • padding (str) –

  • strides (int|Sequence[int]) –

  • dilation_rate (int|Sequence[int]) –

  • input_expand_dims (int) – number of dynamic dims to add to the input

  • input_add_feature_dim (bool) –

  • input_split_feature_dim (None|int) –

  • in_dim (Dim|None) –

  • in_spatial_dims (Sequence[Dim|str]|None) –

  • n_out (int|None) – number of outgoing features

  • out_dim (Dim|None) –

  • out_spatial_dims (Sequence[Dim]|None) –

  • input_expand_dims – number of spatial dims to add to the input

  • auto_use_channel_first (bool|NotSpecified) –

get_dep_layers()[source]#
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]#
Parameters:
kwargs: Optional[Dict[str]][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Copy Layer#

class returnn.tf.layers.basic.CopyLayer(in_dim=None, out_dim=None, extra_deps=(), **kwargs)[source]#

This layer does nothing, it copies its input. This is not even a tf.identity. It refers to the same TF tensor. If multiple sources are provided, they are concatenated in the feature-dim.

Parameters:
  • in_dim (Dim|None) – just for checking. but also, if this is provided, it will set the feature_dim to this.

  • out_dim (Dim|None) – alternative to in_dim. see in_dim doc.

  • extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via SelectSearchSourcesLayer. We only have this here for the CopyLayer because the get_out_data_from_opts() must know about it and define the right beam. Also see the option collocate_with, which is different in that it does not add a dependency. Note that this will not be real TF control dependencies, but it simply sets the dependency on the layer. If you want to have a real TF control dependency, use IdentityLayer.

layer_class: Optional[str] = 'copy'[source]#
output_loss: Optional[tf.Tensor][source]#
output_before_activation: Optional[OutputWithActivation][source]#
get_dep_layers()[source]#
Return type:

list[LayerBase]

classmethod get_out_data_from_opts(name, sources=(), extra_deps=(), out_type=None, in_dim=None, out_dim=None, n_out=<class 'returnn.util.basic.NotSpecified'>, out_shape=None, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • extra_deps (list[LayerBase]) –

  • out_type (dict[str]|None) –

  • in_dim (Dim|None) –

  • out_dim (Dim|None) –

  • n_out (int|None|NotSpecified) –

  • out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) –

Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]#
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork) –

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

kwargs: Optional[Dict[str]][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Cumulative Sum Layer#

class returnn.tf.layers.basic.CumsumLayer(axis='T', additional_left_summand_per_element=None, reverse=False, **kwargs)[source]#

Basically wraps tf.cumsum. Also supports that in the RecLayer.

Parameters:
  • axis (str) – see Data.get_axis_from_description()

  • additional_left_summand_per_element (str|int|float|None) – the order matters for tf.string

  • reverse (bool) –

layer_class: Optional[str] = 'cumsum'[source]#
recurrent = True[source]#
classmethod get_out_data_from_opts(name, sources, axis='T', **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • axis (str) –

Return type:

Data

classmethod get_rec_initial_extra_outputs(network, batch_dim, rec_layer, axis='T', sources=(), **kwargs)[source]#
Parameters:
Return type:

dict[str,tf.Tensor]

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Dot Layer#

class returnn.tf.layers.basic.DotLayer(reduce=<class 'returnn.util.basic.NotSpecified'>, red1=<class 'returnn.util.basic.NotSpecified'>, red2=<class 'returnn.util.basic.NotSpecified'>, var1=<class 'returnn.util.basic.NotSpecified'>, var2=<class 'returnn.util.basic.NotSpecified'>, add_var2_if_empty=<class 'returnn.util.basic.NotSpecified'>, use_mask: bool = True, debug=False, **kwargs)[source]#

This performs a dot-product of two sources. The underlying matmul expects shapes (shared…, I, J) * (shared…, J, K) -> (shared…, I, K). We say that J is the axis to be reduced, I is the var-dim of source 1, and K is the var-dim of source 2. I, J, K can also be multiple axes from the sources. The var-dims don’t need to exist. All other axes (shared…) are expected to match.

You should try to avoid having the same dims in both sources when they are not reduced such that you would end up having some dim twice in the output, e.g. (shared…, I, I). You should avoid this because the dim order should never matter (https://github.com/rwth-i6/returnn/wiki/RETURNN-principles). If you need to perform such an operation, you can use ReinterpretDataLayer to introduce a new dim tag.

The reduce dim can also be the sparse dim of one of the sources. In this case, it behaves like GatherLayer.

Parameters:
  • reduce (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of both sources

  • red1 (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of first source

  • red2 (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of second source

  • var1 (str|Dim|tuple[str|Dim]|list[str|Dim]|None) – var axes of first source

  • var2 (str|Dim|tuple[str|Dim]|list[str|Dim]|None) – var axes of second source

  • add_var2_if_empty (bool) – if var2=None, add dim=1 at the end

  • use_mask – If the reduction is over dynamic axes, to get the correct sum reduction, we need to apply masking to one of the inputs. This is done automatically. By disabling this flag, this would be disabled.

  • debug (bool) – will print debug shapes, etc.

Earlier defaults:

red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True.

However, these are bad, for multiple reasons, like using integers, but also in general.

See https://github.com/rwth-i6/returnn/issues/627 for details.

layer_class: Optional[str] = 'dot'[source]#
classmethod transform_config_dict(d, network, get_layer)[source]#
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork) –

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

classmethod get_out_data_from_opts(name, sources, reduce=<class 'returnn.util.basic.NotSpecified'>, red1=<class 'returnn.util.basic.NotSpecified'>, red2=<class 'returnn.util.basic.NotSpecified'>, var1=<class 'returnn.util.basic.NotSpecified'>, var2=<class 'returnn.util.basic.NotSpecified'>, add_var2_if_empty=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • reduce (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of both sources

  • red1 (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of first source

  • red2 (str|Dim|tuple[str|Dim]|list[str|Dim]) – reduce axes of second source

  • var1 (str|Dim|tuple[str|Dim]|list[str|Dim]|None) – var axes of first source

  • var2 (str|Dim|tuple[str|Dim]|list[str|Dim]|None) – var axes of second source

  • add_var2_if_empty (bool) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#

Elementwise Product Layer#

class returnn.tf.layers.basic.ElemwiseProdLayer(axes, size=None, **kwargs)[source]#

Element-wise product in some axes. Microsoft calls this “static attention”, in Deep Conv. NN with Layer-wise Context Expansion and Attention (LACE). The matrix/tensor to be used for the product are given as a trainable parameter. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – e.g. “spatial”, but all those axes must be of fixed dimension

  • size (tuple[int]) – for double-checking, you can explicitly provide the size

layer_class: Optional[str] = 'elemwise_prod'[source]#
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Gating Layer#

class returnn.tf.layers.basic.GatingLayer(activation, gate_activation='sigmoid', out_dim=None, **kwargs)[source]#

Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.

Parameters:
  • activation (str) –

  • gate_activation (str) –

  • out_dim (Dim|None) –

layer_class: Optional[str] = 'gating'[source]#
classmethod get_out_data_from_opts(name, sources, n_out=<class 'returnn.util.basic.NotSpecified'>, out_dim=None, **kwargs)[source]#
Parameters:
Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Linear Layer#

class returnn.tf.layers.basic.LinearLayer(activation=None, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, use_transposed_weights=False, **kwargs)[source]#

Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function. See also DotLayer, ElemwiseProdLayer, WeightedSumLayer.

Parameters:
layer_class: Optional[str] = 'linear'[source]#
output_before_activation: Optional[OutputWithActivation][source]#
kwargs: Optional[Dict[str]][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Pooling Layer#

class returnn.tf.layers.basic.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, in_dim=None, in_spatial_dims=None, out_dim=None, out_spatial_dims=None, use_channel_first=<class 'returnn.util.basic.NotSpecified'>, use_time_mask=False, **kwargs)[source]#

A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.

Parameters:
  • mode (str) – “max” or “avg”

  • pool_size (tuple[int]) – shape of the window of each reduce

  • padding (str) – “same”, “valid” or “same_static”. “same_static” is calculated differently depending on whether an axis is static or dynamic. For static axes, “same_static” padding is the same as “same” padding, i.e. filter_size - 1 - (T + strides - 1) % strides. For dynamic axes, “same_static” calculates the total padding size as filter_size - 1, i.e. it is independent of the length T of the axis and the striding. For dynamic axes, to avoid skipping any frames on the right, we set left_padding = (filter_size - strides) // 2.

  • dilation_rate (tuple[int]|int) –

  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size

  • in_dim (Dim|None) –

  • in_spatial_dims (list[Dim|str]|None) –

  • out_dim (Dim|None) –

  • out_spatial_dims (list[Dim]|None) –

  • use_channel_first (bool|NotSpecified) – if set, will transform input to NCHW format

  • use_time_mask (bool) –

layer_class: Optional[str] = 'pool'[source]#
recurrent = True[source]#
classmethod get_out_data_from_opts(name, sources, network, pool_size, strides=None, dilation_rate=1, padding='VALID', in_dim=None, in_spatial_dims=None, out_dim=None, out_spatial_dims=None, use_channel_first=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • network (returnn.tf.network.TFNetwork) –

  • pool_size (tuple[int]|list[int]) –

  • strides (tuple[int]|list[int]|int) –

  • dilation_rate (int|tuple[int]|list[int]) –

  • padding (str) –

  • in_dim (Dim|None) –

  • in_spatial_dims (list[Dim|str]|None) –

  • out_dim (Dim|None) –

  • out_spatial_dims (list[Dim]|None) –

  • use_channel_first (bool|NotSpecified) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Reduce Layer#

class returnn.tf.layers.basic.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None, **kwargs)[source]#

This reduces some axis by using e.g. “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.

Parameters:
  • mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”

  • axes (Sequence[Dim|str]) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See Data.get_axes_from_description().

  • axis (Dim|str) – for compatibility, can be used instead of axes

  • keep_dims (bool) – if dimensions should be kept (will be 1)

  • enforce_batch_dim_axis (int|None) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.

  • use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.

layer_class: Optional[str] = 'reduce'[source]#
classmethod reduce(input_data, mode, axes=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None)[source]#
Parameters:
  • input_data (Data) –

  • mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”

  • axes (int|list[int]|str) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See Data.get_axes_from_description().

  • keep_dims (bool) – if dimensions should be kept (will be 1)

  • enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.

  • use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.

Return type:

tf.Tensor

classmethod need_enforce_batch_dim_axis(axes)[source]#
Parameters:

axes (int|list[int]|str|Dim) –

Returns:

if any integer is in axes, thus we should have a fixed dimension layout

Return type:

bool

classmethod get_axes(axis, input_data)[source]#
Parameters:
  • axis – see self.__init__()

  • input_data (Data) –

Returns:

list of axes

Return type:

list[int]

classmethod get_out_data_from_opts(name, sources, mode='', axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • mode (str) – (default here “” because other code uses this function)

  • axes (str|list[str]|None) –

  • axis (str|None) –

  • keep_dims (bool) –

  • enforce_batch_dim_axis (int|None) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Reduce-Out Layer#

class returnn.tf.layers.basic.ReduceOutLayer(mode, num_pieces, out_dim=None, **kwargs)[source]#

Combination of SplitDimsLayer applied to the feature dim and ReduceLayer applied to the resulting feature dim. This can e.g. be used to do maxout.

Parameters:
  • mode (str) – “sum” or “max” or “mean”

  • num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.

  • out_dim (Dim|None) –

layer_class: Optional[str] = 'reduce_out'[source]#
classmethod get_out_data_from_opts(num_pieces, sources, name, out_dim=None, **kwargs)[source]#
Parameters:
  • num_pieces (int) –

  • sources (list[LayerBase]) –

  • name (str) –

  • out_dim (Dim|None) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#

Switch Layer#

class returnn.tf.layers.basic.SwitchLayer(condition, true_from, false_from, **kwargs)[source]#

Wrapper around tf.where() (or more generically returnn.tf.util.basic.where_bc()), or statically choose a single source if the condition is a callable (…)->bool. (tf.cond is not useful here, as the sources would have been already constructed and computed.)

This layer is also useful for applying any kind of generic masking to the frames. E.g. one could have a layer called “mask” computing a boolean mask for the values stored in another layer “input”. Then use this layer with condition=”mask”, true_from=”input”, false_from=mask_value, to mask out all frames where the mask is false with the mask_value.

See also CondLayer. See also SeqLenMaskLayer if you just want to mask using the sequence lengths.

Parameters:
  • condition (LayerBase|bool) – if callable, expected to be (…)->bool, and called in transform_config_dict

  • true_from (LayerBase|float|int|None) –

  • false_from (LayerBase|float|int|None) –

layer_class: Optional[str] = 'switch'[source]#
classmethod transform_config_dict(d, network, get_layer)[source]#
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork) –

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

classmethod get_out_data_from_opts(name, condition, true_from, false_from, **kwargs)[source]#
Parameters:
Return type:

Data

get_dep_layers()[source]#
Return type:

list[LayerBase]

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#

Variable Layer#

Weighted Sum Layer#

class returnn.tf.layers.basic.WeightedSumLayer(axes, padding=None, size=None, keep_dims=None, **kwargs)[source]#

Calculates a weighted sum, either over a complete axis of fixed dimension, or over some window. Can also do that for multiple axes. The weights are a trainable parameter matrix. Similar would be to use ElemwiseProdLayer and ReduceLayer, or just a DotLayer with a VariableLayer. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – the axes to do the weighted-sum over

  • padding (str) – “valid” or “same”, in case of keep_dims=True

  • size (None|tuple[int]) – the kernel-size. if left away, the axes must be of fixed dimension, and we will use keep_dims=False, padding=”valid” by default. Otherwise, if given, you must also provide padding and keep_dims=True by default.

  • keep_dims (bool) – if False, the axes will be squeezed away. see also size.

layer_class: Optional[str] = 'weighted_sum'[source]#
classmethod get_out_data_from_opts(name, sources, axes, padding=None, size=None, keep_dims=None, **kwargs)[source]#
Parameters:
  • name (str) –

  • sources (list[LayerBase]) –

  • axes (str|list[str]) –

  • padding (str|None) –

  • size (None|tuple[int]) –

  • keep_dims (bool|None) –

Return type:

Data

kwargs: Optional[Dict[str]][source]#
output_before_activation: Optional[OutputWithActivation][source]#
output_loss: Optional[tf.Tensor][source]#
rec_vars_outputs: Dict[str, tf.Tensor][source]#
search_choices: Optional[SearchChoices][source]#
params: Dict[str, tf.Variable][source]#
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]#
stats: Dict[str, tf.Tensor][source]#
input_data: Optional[Data][source]#