returnn.tf.layers.basic

Many canonical basic layers.

class returnn.tf.layers.basic.SourceLayer(network, data_key=None, sources=(), **kwargs)[source]

This gives access to some entry from network.extern_data (ExternData).

Parameters:
layer_class = 'source'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(network, data_key=None, **kwargs)[source]
Parameters:
Return type:

Data

returnn.tf.layers.basic.concat_sources(src_layers)[source]
Parameters:src_layers (list[LayerBase]) –
Returns:data with placeholders set
Return type:Data
returnn.tf.layers.basic.get_concat_sources_data_template(src_layers, name=None)[source]

This just creates a template Data instance, without creating any real TF tensors. concat_sources() (and related) are the equivalent functions which would create a Data together with the tensor.

Parameters:
Returns:

data with no placeholders set. it is always a copy or new instance, so safe to manipulate

Return type:

Data

returnn.tf.layers.basic.concat_sources_with_opt_dropout(src_layers, dropout=0, dropout_noise_shape=None, dropout_on_forward=False)[source]
Parameters:
  • src_layers (list[LayerBase]) –
  • dropout (float) – dropout rate that will be applied if train_flag is set or dropout_on_forward is enabled
  • dropout_noise_shape (tuple|list|dict|None) – provide 1 for broadcasting or None otherwise for each axis.

The default “None” will broadcast across all dynamic axes including the batch axis. Use {“*”: None} to disable broadcasting for all axes. :param bool dropout_on_forward: apply dropout also during inference :return: data with placeholders set :rtype: Data

class returnn.tf.layers.basic.CopyLayer(extra_deps=(), **kwargs)[source]

This layer does nothing, it copies its input. If multiple sources are provided, they are concatenated in the feature-dim.

Parameters:extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via SelectSearchSourcesLayer. We only have this here for the CopyLayer because the get_out_data_from_opts() must know about it and define the right beam. Also see the option collocate_with, which is different in that it does not add a dependency.
layer_class = 'copy'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources=(), extra_deps=(), out_type=None, n_out=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • extra_deps (list[LayerBase]) –
  • out_type (dict[str]|None) –
  • n_out (int|None|NotSpecified) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
class returnn.tf.layers.basic.DropoutLayer(extra_deps=(), **kwargs)[source]

Just the same as CopyLayer, because that one already supports dropout.

Parameters:extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via SelectSearchSourcesLayer. We only have this here for the CopyLayer because the get_out_data_from_opts() must know about it and define the right beam. Also see the option collocate_with, which is different in that it does not add a dependency.
layer_class = 'dropout'[source]
class returnn.tf.layers.basic.ScaledGradientLayer(scale, **kwargs)[source]

Just tf.identity in the forward pass. Scales the gradient by some factor in backprop. Can be used as gradient reversal layer (with negative factor). Uses TFUtil.scaled_gradient(), or tf.stop_gradient()

Parameters:scale (float) – if 0., will use tf.stop_gradient
layer_class = 'scaled_grad'[source]
class returnn.tf.layers.basic.SelectSearchSourcesLayer(search_choices_layer, sources, **kwargs)[source]

Selects the corresponding search beams from the source, given current search choices (determined by a layer). Like InternalLayer, only for internal purpose at the moment.

Parameters:
classmethod select_if_needed(layer, search_choices)[source]
Parameters:
  • layer (LayerBase) –
  • search_choices (SearchChoices|None) –
Return type:

LayerBase

get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, search_choices, **kwargs)[source]
Parameters:
Return type:

Data

class returnn.tf.layers.basic.ActivationLayer(activation, **kwargs)[source]

This layer just applies an activation function. See TFUtil.get_activation_function() about supported functions. Also see EvalLayer and CombineLayer for similar layers.

Parameters:activation (str) – e.g. “relu”, “tanh”, etc
layer_class = 'activation'[source]
classmethod get_out_data_from_opts(activation, **kwargs)[source]
Parameters:activation (str) –
Return type:Data
class returnn.tf.layers.basic.BatchNormLayer(use_shift=<class 'returnn.util.basic.NotSpecified'>, use_std=<class 'returnn.util.basic.NotSpecified'>, use_sample=<class 'returnn.util.basic.NotSpecified'>, force_sample=<class 'returnn.util.basic.NotSpecified'>, momentum=<class 'returnn.util.basic.NotSpecified'>, epsilon=<class 'returnn.util.basic.NotSpecified'>, update_sample_only_in_training=<class 'returnn.util.basic.NotSpecified'>, delay_sample_update=<class 'returnn.util.basic.NotSpecified'>, param_version=<class 'returnn.util.basic.NotSpecified'>, gamma_init=<class 'returnn.util.basic.NotSpecified'>, beta_init=<class 'returnn.util.basic.NotSpecified'>, masked_time=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]

Implements batch-normalization (http://arxiv.org/abs/1502.03167) as a separate layer.

Also see NormLayer.

Parameters:
  • use_shift (bool) –
  • use_std (bool) –
  • use_sample (float) – defaults to 0.0 which is used in training
  • force_sample (bool) – even in eval, use the use_sample factor
  • momentum (float) – for the running average of sample_mean and sample_std
  • update_sample_only_in_training (bool) –
  • delay_sample_update (bool) –
  • param_version (int) – 0 or 1
  • epsilon (float) –
  • gamma_init (str|float) – see TFUtil.get_initializer(), for the scale
  • beta_init (str|float) – see TFUtil.get_initializer(), for the mean
  • masked_time (bool) – flatten and mask input tensor

The default settings for these variables are set in the function “batch_norm” of the LayerBase. If you do not want to change them you can leave them undefined here. With our default settings:

  • In training: use_sample=0, i.e. not using running average, using current batch mean/var.
  • Not in training (e.g. eval): use_sample=1, i.e. using running average, not using current batch mean/var.
  • The running average includes the statistics of the current batch.
  • The running average is also updated when not training.
layer_class = 'batch_norm'[source]
class returnn.tf.layers.basic.LayerNormLayer(epsilon=1e-06, **kwargs)[source]

Applies layer-normalization.

Note that we just normalize over the feature-dim axis here. This is consistent to the default behavior of tf.keras.layers.LayerNormalization and also how it is commonly used in many models, including Transformer.

However, there are cases where it would be common to normalize over all axes except batch-dim, or all axes except batch and time. For a more generic variant, see NormLayer.

Parameters:epsilon (float) –
layer_class = 'layer_norm'[source]
classmethod get_out_data_from_opts(sources, name, **kwargs)[source]
Parameters:
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

class returnn.tf.layers.basic.NormLayer(axes, param_shape='F', scale=True, bias=True, epsilon=1e-06, **kwargs)[source]

Normalize over specified axes, e.g. time and/or feature axis.

Note: For calculating a norm, see MathNormLayer instead.

In case of just feature (axes="F"), this corresponds to layer normalization (see LayerNormLayer). In case of time and feature (axes="TF") for a 3D input, or more general all except batch (axes="except_batch"), this corresponds to group normalization with G=1, or non-standard layer normalization. (The definition of layer-normalization is not clear on what axes should be normalized over. In many other frameworks, the default axis is just the last axis, which is usually the feature axis. However, in certain implementations and models, it is also common to normalize over all axes except batch.)

The statistics are calculated just on the input. There are no running statistics (in contrast to batch normalization, see BatchNormLayer).

For some discussion on the definition of layer-norm vs group-norm, also see here and here.

Parameters:
  • axes (str|list[str]) – axes over which the mean and variance are computed, e.g. “F” or “TF”
  • param_shape (str|list[str]|tuple[str]|int|list[int]|tuple[int]) – shape of the scale and bias parameters. You can also refer to (static) axes of the input, such as the feature-dim. This is also the default, i.e. a param-shape of [F], independent of the axes to normalize over.
  • scale (bool) – add trainable scale parameters
  • bias (bool) – add trainable bias parameters
  • epsilon (float) – epsilon for numerical stability
layer_class = 'norm'[source]
classmethod get_out_data_from_opts(sources, name, **kwargs)[source]
Parameters:
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

class returnn.tf.layers.basic.MathNormLayer(p, axes, keep_dims=False, **kwargs)[source]

Calculates sum(abs(x) ** p) ** (1./p).

Parameters:
  • p (int|float) –
  • axes (str|list[str]) –
  • keep_dims (bool) –
layer_class = 'math_norm'[source]
classmethod get_out_data_from_opts(name, sources, axes, keep_dims=False, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axes (str|list[str]) –
  • keep_dims (bool) –
Return type:

Data

class returnn.tf.layers.basic.SliceLayer(axis, slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]

Slicing on the input, i.e. x[start:end:step] in some axis. See also SliceNdLayer, for variable start. See also GatherLayer, for one single position.

Note that __getitem__ on a TF tensor (or also Numpy ND array) is more generic, and supports slices in multiple axes, as well as adding new dimensions, etc. It even allows to get boolean values, and then applies a boolean mask. See TF _slice_helper (== tf.Tensor.__getitem__) for a generic implementation, which calls tf.strided_slice. If we ever need such more generic support, we might consider adding a new layer, like GenericSliceLayer, which gets a splice_spec, just like _slice_helper (argument to __getitem__). But any such a slice can already be constructed with multiple individual layers, which perform individual slices (per axis).

We just support slicing in a single axis here, with optional striding (slice_step).

Parameters:
  • axis (int|str) –
  • axis_kind (str|None) – “T” for time, “B” for batch, “F” for feature
  • slice_start (int|None) –
  • slice_end (int|None) –
  • slice_step (int|None) –
layer_class = 'slice'[source]
classmethod get_out_data_from_opts(name, axis, sources=(), slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]
Parameters:
  • name (str) –
  • axis (str) –
  • sources (list[LayerBase]) –
  • slice_start (int|None) –
  • slice_end (int|None) –
  • slice_step (int|None) –
Return type:

Data

class returnn.tf.layers.basic.SliceNdLayer(start, size, min_size=None, **kwargs)[source]

This takes out a slice-range from the time axis, e.g. x[start:start + size]. If the input is of shape (B,T,F) and start is of shape (B,), then the output will be of shape (B,size,F). If the input is of shape (B,T,F) and start is of shape (B,T), then the output will be of shape (B,T,size,F). This layer allows a different start slice point for each batch, in contrast to SliceLayer, and the start is variable. See also GatherNdLayer. PrefixInTimeLayer can recover the original shape (by zero-padding).

Parameters:
  • start (LayerBase) – (B,…)
  • size (int|LayerBase|None) – if None, it uses the max possible size, and it becomes a dynamic axis.
  • min_size (int|None) – if size is None, but we want to have a min-size
layer_class = 'slice_nd'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources=(), start=None, size=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • start (LayerBase|None) –
  • size (int|LayerBase|None) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.GatherLayer(position, axis, **kwargs)[source]

Gathers slices on a specified axis from the input layer using indices from a position layer. If the input is a layer of the shape [B,D,F1], and position of shape [B,F2], this will yield output of the shape [B,F2,F1] where

output[b,f2,f1] = input[b,position[b,f2],f1]

(if D is the axis to gather from). In general, all shared axes of the input and the positions will be considered as batch-axes.

The position argument can also be an int. In this case, this simply gives input[position] one the specified axis.

It’s basically a wrapper around tf.gather. It provides the same functionality as the deprecated GatherNdLayer, but is more generic. See also GatherNdLayer.

Parameters:
  • position (LayerBase|int) – Layer containing the indices used to select the slices of the input from. If another layer, must be of type int32 or int64. Can also specify a constant int.
  • axis (str) – The axis into which we gather the indices into
layer_class = 'gather'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources, position, axis, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • position (LayerBase|int) –
  • axis (str) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.GatherNdLayer(position, **kwargs)[source]

Warning: This layer is deprecated, use the more general GatherLayer instead. GatherLayer should be equivalent, but is more general (supports multiple batch dimensions, can specify gather

axis) and its name is less misleading.

This takes out a position from some axis, e.g. x[pos]. This layers allows a different position for each batch. It’s basically a wrapper around tf.gather (the name of this layer is misleading). See also GatherLayer instead, which will replace this layer in the future. See also SliceNdLayer. See also ScatterNdLayer, which is the inverse operation.

Parameters:position (LayerBase) – indices into first axis (excluding batch) of the input
layer_class = 'gather_nd'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources, position, **kwargs)[source]
Parameters:
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.ScatterNdLayer(position, position_axis, output_dim_via_time_from, filter_invalid_indices=False, **kwargs)[source]

The inverse of GatherNdLayer. Mostly a wrapper for tf.scatter_nd.

The input to the layer are the updates, the indices are via the position argument. The indices are into the newly constructed output dimension. The output shape is constructed via the common shape of the input, the position, and the the unique common axis (if not unique, we would need to introduce an option to specify it) is replaced by the given output dimension (currently via output_dim_via_time_from).

Examples:

position (indices): (B,eTs)
input (updates): (eTs,D) or (B,eTs,D) -> expanded to (B,eTs,D)
output shape: (B,eT,D)

position (indices): (B,dT,eTs)
input (updates): (eTs,D) -> expanded to (B,dT,eTs,D)
output shape: (B,dT,eT,D)

position (indices): (dT,eTs)
input (updates): (eTs,D) -> expanded to (dT,eTs,D)
output shape: (dT,eTs,D)

position (indices): (dT,eTs)
input (updates): (B,eTs,D) -> expanded to (dT,eTs,B,D)
output shape: (dT,eT,B,D)

In all these examples, output_dim_via_time_from is (B,eT,F), and eTs gets replaced by eT.

Parameters:
  • position (LayerBase) – indices into first axis (excluding batch) of the output
  • position_axis (str|int) – axis in position to replace by the output-dim
  • output_dim_via_time_from (LayerBase) – use the time-dim from this layer as the output-dim
  • filter_invalid_indices (bool) – allow for indices <0 or >= output_dim, which will be discarded in the output
layer_class = 'scatter_nd'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources, position, position_axis, output_dim_via_time_from, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • position (LayerBase) –
  • position_axis (str|int) – axis in position to replace by the output-dim
  • output_dim_via_time_from (LayerBase) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.LinearLayer(activation=None, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, use_transposed_weights=False, **kwargs)[source]

Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function. See also DotLayer, ElemwiseProdLayer, WeightedSumLayer.

Parameters:
layer_class = 'linear'[source]
class returnn.tf.layers.basic.SoftmaxLayer(activation='softmax', **kwargs)[source]

Just a LinearLayer with activation=”softmax” by default.

layer_class = 'softmax'[source]
class returnn.tf.layers.basic.LengthLayer(axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]

Returns the length of sources as (B,), via input size_placeholder.

Parameters:
  • axis (str|DimensionTag) –
  • add_time_axis (bool) –
  • dtype (str) –
  • sparse (bool) –
layer_class = 'length'[source]
classmethod get_out_data_from_opts(name, sources, axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str|DimensionTag) –
  • add_time_axis (bool) –
  • dtype (str) –
  • sparse (bool) –
Return type:

Data

class returnn.tf.layers.basic.SoftmaxOverSpatialLayer(axis=None, energy_factor=None, start=None, window_start=None, window_size=None, use_time_mask=None, log_space=False, **kwargs)[source]

This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,T,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to SoftmaxLayer, this will not do a linear transformation. See SeqLenMaskLayer if you just want to apply a masking.

Parameters:
  • axis (str|None) – which axis to do the softmax over
  • energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim).
  • start (LayerBase|None) – Tensor of shape (B,) indicating the start frame
  • window_start (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window start.
  • window_size (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window size.
  • use_time_mask (bool) – if True, assumes dyn seq len, and use it for masking. By default, if dyn seq len exists, it uses it.
  • log_space (bool) – if True, returns in log space (i.e. uses log_softmax)
layer_class = 'softmax_over_spatial'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources, axis=None, start=None, window_start=None, window_size=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str|None) –
  • start (LayerBase|None) –
  • window_start (LayerBase|None) –
  • window_size (LayerBase|int|None) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.SeqLenMaskLayer(mask_value, axis='T', seq_len_source=None, start=None, window_start=None, window_size=None, **kwargs)[source]

Masks some values away given the seq_len_source with mask_value. Also see SoftmaxOverSpatialLayer. Also see SwitchLayer, which can be used to apply a generic mask.

Parameters:
  • seq_len_source (LayerBase|None) – if not given, uses source
  • axis (str|int) –
  • mask_value (float) –
  • start (LayerBase|None) – Tensor of shape (B,) indicating the start frame
  • window_start (LayerBase|None) – Tensor of shape (B,) indicating the window start
  • window_size (LayerBase|int|None) –
layer_class = 'seq_len_mask'[source]
classmethod build_mask(x, axis='T', seq_len_source=None, start=None, window_start=None, window_size=None)[source]
Parameters:
  • x (Data) –
  • axis (str|int) –
  • seq_len_source (Data|None) –
  • start (Data|None) –
  • window_start (Data|None) –
  • window_size (Data|int|None) –
Returns:

mask which is broadcastable to energy_data, thus you can e.g. use TFUtil.where_bc()

Return type:

tf.Tensor

get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, start=None, window_start=None, window_size=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • start (LayerBase|None) –
  • window_start (LayerBase|None) –
  • window_size (LayerBase|int|None) –
Return type:

Data

class returnn.tf.layers.basic.RandIntLayer(shape, maxval, minval=0, dtype='int32', seed=None, **kwargs)[source]

Generates random numbers using tf.random.uniform

Parameters:
  • shape (tuple[DimensionTag|int]|list[DimensionTag|int]) – desired shape of output tensor
  • maxval (int) – upper bound (exclusive) on range of random values
  • minval (int) – lower bound (inclusive) on range of random values
  • dtype (str) – type of the output. For random ints, int32 and int64 make sense, but could also be floats
  • seed (int|None) – random seed
layer_class = 'rand_int'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, shape, maxval, minval=0, dtype='int32', **kwargs)[source]
Parameters:
  • name (str) –
  • shape (tuple[DimensionTag|int]|list[DimensionTag|int]) – desired shape of output tensor
  • maxval (int) – upper bound (exclusive) on range of random values
  • minval (int) – lower bound (inclusive) on range of random values
  • dtype (str) – type of the output. For random ints, int32 and int64 make sense, but could also be floats
Return type:

Data

class returnn.tf.layers.basic.RangeLayer(limit, start=0, delta=1, dtype=None, sparse=False, **kwargs)[source]

Generic wrapper around tf.range. See also RangeInAxisLayer.

Parameters:
  • limit (int|float) –
  • start (int|float) –
  • delta (int|float) –
  • dtype (str|None) –
  • sparse (bool) –
layer_class = 'range'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, limit, start=0, delta=1, dtype=None, sparse=False, **kwargs)[source]
Parameters:
  • name (str) –
  • limit (int|float) –
  • start (int|float) –
  • delta (int|float) –
  • dtype (str|None) –
  • sparse (bool) –
Return type:

Data

class returnn.tf.layers.basic.RangeInAxisLayer(axis, dtype='int32', unbroadcast=False, keepdims=False, sparse=False, **kwargs)[source]

Assume that the input is e.g. (B,T,D), and you specify axis=”T”, you will get (B=1,T,D=1), where the specified axis is filled with tf.range. See also RangeLayer.

Parameters:
  • axis (str) –
  • dtype (str) –
  • unbroadcast (bool) – DEPRECATED, unsupported, and not needed
  • keepdims (bool) – DEPRECATED, unsupported, and not needed
  • sparse (bool) –
layer_class = 'range_in_axis'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, axis, dtype='int32', sparse=False, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str) –
  • dtype (str) –
  • sparse (bool) –
class returnn.tf.layers.basic.RangeFromLengthLayer(dtype='int32', sparse=False, **kwargs)[source]

Given some dynamic sequence lengths as input, this creates a tf.range over the implied dimension. As a side effect, this can create a new dyn dim tag for the given sequence lengths. This side effect can be the main functionality in certain use cases. See also RangeInAxisLayer.

Consider the example:

y: {class: range_in_axis, from: x, axis: T}

This is basically equivalent to:

x_len: {class: length, from: x}
y: {class: range_from_length, from: x_len}
Parameters:
  • axis (str) –
  • dtype (str) –
  • sparse (bool) –
layer_class = 'range_from_length'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, dtype='int32', sparse=False, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • dtype (str) –
  • sparse (bool) –
class returnn.tf.layers.basic.BatchSoftmaxLayer(**kwargs)[source]

Softmax over spacial and feature axis

layer_class = 'batch_softmax'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.ConstantLayer(sources, value=0.0, dtype=None, with_batch_dim=False, **kwargs)[source]

Output is a constant value.

Parameters:
  • sources (list[LayerBase]) –
  • value (int|float|bool) –
  • dtype (str|None) –
  • with_batch_dim (bool) –
layer_class = 'constant'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, value=0.0, dtype=None, with_batch_dim=False, **kwargs)[source]
Parameters:
  • name (str) –
  • value (int|float|bool) –
  • dtype (str|None) –
  • with_batch_dim (bool) –
Return type:

Data

class returnn.tf.layers.basic.GatingLayer(activation, gate_activation='sigmoid', **kwargs)[source]

Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.

layer_class = 'gating'[source]
classmethod get_out_data_from_opts(name, sources, n_out=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • n_out (int|None|NotSpecified) –
Return type:

Data

class returnn.tf.layers.basic.WindowLayer(window_size, window_left=None, window_right=None, axis='T', padding='same', stride=1, **kwargs)[source]

Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. Will always return as batch major mode. E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}. Use stride==window_size and window_right=window_size - 1 in combination with a MergeDimsLayer to achieve feature stacking with right-hand zero padding.

This is not to take out a window from the time-dimension. See SliceLayer or SliceNdLayer.

Parameters:
  • window_size (int) –
  • window_left (int|None) –
  • window_right (int|None) –
  • axis (str) – see Data.get_axis_from_description()
  • padding (str) – “same” or “valid”
  • stride (int) – return only each Nth window
  • kwargs
layer_class = 'window'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, window_size, axis='T', sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • window_size (int) –
  • axis (str) –
Return type:

Data

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, window_size, axis='T', sources=(), **kwargs)[source]
Parameters:
  • batch_dim (tf.Tensor) –
  • rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase) –
  • window_size (int) –
  • axis (str) –
  • sources (list[LayerBase]) –
Return type:

dict[str,tf.Tensor]

class returnn.tf.layers.basic.CumsumLayer(axis='T', additional_left_summand_per_element=None, reverse=False, **kwargs)[source]

Basically wraps tf.cumsum. Also supports that in the RecLayer.

Parameters:
  • axis (str) – see Data.get_axis_from_description()
  • additional_left_summand_per_element (str|int|float|None) – the order matters for tf.string
  • reverse (bool) –
layer_class = 'cumsum'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, axis='T', **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str) –
Return type:

Data

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, axis='T', sources=(), **kwargs)[source]
Parameters:
  • batch_dim (tf.Tensor) –
  • rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase) –
  • axis (str) –
  • sources (list[LayerBase]) –
Return type:

dict[str,tf.Tensor]

class returnn.tf.layers.basic.PadLayer(axes, padding, value=0, mode='constant', **kwargs)[source]

Adds (e.g. zero) padding in some axis or axes.

Parameters:
  • axes (str|list[str]) – e.g. “F” etc. see Dataset.get_axes_from_description().
  • padding (list[(int,int)]|(int,int)|int) – how much to pad left/right in each axis
  • value (int|float) – what constant value to pad, with mode==”constant”
  • mode (str) – “constant”, “reflect”, “symmetric” and “replication”
layer_class = 'pad'[source]
classmethod get_out_data_from_opts(name, axes, padding, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • axes (str|list[str]) –
  • padding (list[(int,int)]|(int,int)|int) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.MergeDimsLayer(axes, keep_order=False, n_out=None, **kwargs)[source]

Merges a list of axes into a single one. (Flatten the dims.) E.g. input is (batch, width, height, dim) and axes=(1,2), then we get (batch, width*height, dim). Or input is (batch, time, height, dim) and axes=”except_time”, then we get (batch, time, height*dim). See also CombineDimsLayer. When batch and time got merged, SplitBatchTimeLayer can undo this. When you want to merge batch and time, but remove the padding efficiently, i.e. flatten it, see FlattenBatchLayer.

Parameters:
  • axes (str|list[str]|list[int]) – see Data.get_axes_from_description(), e.g. “except_time”
  • keep_order (bool) – By default (for historical reasons), the axes are sorted, and then merged. Thus, the order of incoming axes will influence the result. E.g. inputs [B,S,F] and [B,F,S], with axes=["S","F"], will get different results, although the output shape is [B,S*F] in both cases. This is bad: In general, other layers in RETURNN might reorder the axes for various reasons, and all layers should behave in the same way, no matter the order. It is recommended to set keep_order=True, such that the order defined in axes defines the behavior, and not the incoming axis order.
  • n_out (int|None) –
layer_class = 'merge_dims'[source]
classmethod get_out_data_from_opts(name, axes, keep_order=False, sources=(), n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, **kwargs)[source]
Parameters:
  • name (str) –
  • axes (str|list[str]) –
  • keep_order (bool) –
  • sources (list[LayerBase]) –
  • n_out (int|None|NotSpecified) –
  • out_type (None|dict[str]) –
Return type:

Data

class returnn.tf.layers.basic.SplitLayer(axis=None, num_splits=None, size_splits=None, **kwargs)[source]

Splits one axis into multiple parts, via tf.split. self.output is simply the input copied. Each part can be accessed via the sublayers “/%i”.

Parameters:
  • axis (str|None) – feature axis by default
  • num_splits (int|None) –
  • size_splits (list[int]|None) –
layer_class = 'split'[source]
get_sub_layer(layer_name)[source]
Parameters:layer_name (str) –
Return type:LayerBase|None
classmethod get_out_data_from_opts(sources, **kwargs)[source]
Parameters:sources (list[LayerBase]) –
Return type:Data
classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]
Parameters:
  • layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)
  • parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())
Returns:

Data template, network and the class type of the sub-layer

Return type:

(Data, TFNetwork, type)|None

class returnn.tf.layers.basic.SplitDimsLayer(axis, dims, pad_to_multiples=None, pad_value=0, **kwargs)[source]

Splits one axis into multiple axes. E.g. if you know that your feature-dim is composed by a window, i.e. the input is (batch, time, window * feature), you can set axis=”F”, dims=(window, -1), and you will get the output (batch, time, window, feature).

If the split axis has a dynamic length, exactly one of the axes that we split into need to also have a dynamic length. You can e.g. use this to split the input dimension into smaller “chunks” of a fixed window size. E.g. you could have input (batch, time, feature) and set axis=”T”, dims=(-1, window), to get output (batch, split_time, window, feature). In this case, the exact sequence lengths are lost and everything is padded to multiples of the window size using the given padding value. Use ReinterpretDataLayer to receive back the original sequence lengths after merging.

Also see SplitBatchTimeLayer. Also see MergeDimsLayer which can undo this operation.

Parameters:
  • axis (str) – e.g. “F”
  • dims (tuple[int]|list[int]) – what the axis should be split into. e.g. (window, -1)
  • pad_to_multiples (bool|None) – If true, input will be padded to the next multiple of the product of the static dims, such that splitting is actually possible. By default this is done iff the axis has a dynamic size
  • pad_value (int|float) – What pad value to use for pad_to_multiples
layer_class = 'split_dims'[source]
classmethod get_out_data_from_opts(name, axis, dims, pad_to_multiples=None, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • axis (str|int) –
  • dims (tuple[int]) –
  • pad_to_multiples (bool|None) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.SplitBatchTimeLayer(base, **kwargs)[source]

A very specific layer which expects to get input of shape (batch * time, …) and converts it into (batch, time, …), where it recovers the seq-lens from some other layer. See SplitDimsLayer for a more generic layer.

Parameters:base (LayerBase) – used to recover the seq-lens
layer_class = 'split_batch_time'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, base, sources=(), **kwargs)[source]
Parameters:
Return type:

Data

class returnn.tf.layers.basic.FlattenBatchLayer(axis='T', batch_major=True, **kwargs)[source]

Merges one axis into the batch axis. If the axis has dynamic lengths, this would use flattening, i.e. recalculate the padding, i.e. the size changes. This basically wraps flatten_with_seq_len_mask() or flatten_with_seq_len_mask_time_major(). See also MergeDimsLayer, which does not do flattening, i.e. the size stays the same.

Parameters:
  • axis (str) –
  • batch_major (bool) – if False, will flatten in time-major manner
layer_class = 'flatten_batch'[source]
classmethod get_out_data_from_opts(sources, name, axis='T', batch_major=True, **kwargs)[source]
Parameters:
  • sources (list[LayerBase]) –
  • name (str) –
  • axis (str) –
  • batch_major (bool) – if False, will flatten in time-major manner
Return type:

Data

class returnn.tf.layers.basic.UnflattenNdLayer(sizes, num_axes, declare_same_sizes_as=None, **kwargs)[source]

This keeps the batch axis as-is, i.e. the flattening/unflattening did not happen on the batch axis.

Example:

Assumes that the input is of shape (B,T,<Ds>) which represents flattened images, where each image is of size width * height. We additionally provide these image sizes (shape (B,2)), i.e. (width,height) tuples. We return the unflattened images of shape (B,W,H,<Ds>), where W/H are the max width/height.

This basically wraps TFUtil.unflatten_nd().

Parameters:
  • sizes (LayerBase) –
  • num_axes (int) –
  • declare_same_sizes_as (dict[int,LayerBase]|None) –
layer_class = 'unflatten_nd'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, num_axes, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • num_axes (int) –
Return type:

Data

class returnn.tf.layers.basic.ExpandDimsLayer(axis, dim=1, **kwargs)[source]

Adds some axis.

Parameters:
  • axis (str|int) – axis to add, e.g. “F”|”feature” or “spatial”|”time”|”T”. if this is an integer, the input data is first converted into batch-major mode, and then this is counted with batch-dim.
  • dim (int) – dimension of new axis (1 by default)
layer_class = 'expand_dims'[source]
classmethod get_out_data_from_opts(name, axis, dim=1, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • axis (str) –
  • dim (int) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.RepeatLayer(repetitions, axis='T', **kwargs)[source]

A wrapper around tf.repeat, but supports an additional batch axis for the durations The sum of the repetitions has to be non-zero for each sequence in the batch.

This layer can only be used with Tensorflow 1.15.0 or newer.

Parameters:
  • repetitions (LayerBase|int) – number of repetitions for each sequence and position in target axis. Can be [B,T] or [T,B] or some subset of that shape
  • axis (str) – (dynamic) axis for repetition (currently only time axis is supported)
layer_class = 'repeat'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, axis, repetitions, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • axis (str) –
  • repetitions (LayerBase|int) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.TileLayer(multiples, **kwargs)[source]

A wrapper around tf.tile

Parameters:int] multiples (dict[str,) – number of multiples per axis (axis provided as str)
layer_class = 'tile'[source]
classmethod get_out_data_from_opts(name, multiples, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • int] multiples (dict[str,) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.CastLayer(dtype, output, **kwargs)[source]

Cast to some other dtype.

Parameters:
  • dtype (str) –
  • output (Data) –
layer_class = 'cast'[source]
classmethod get_out_data_from_opts(dtype, **kwargs)[source]
Parameters:dtype (str) –
Return type:Data
class returnn.tf.layers.basic.SwapAxesLayer(axis1, axis2, **kwargs)[source]

Swaps two axes. Basically a wrapper around TFUtil.swapaxes(). Note that usually, this should not be needed, and it is recommended not to be used, as this will be unnecessarily inefficient. Normally, all RETURNN layers will automatically transpose the input data into whatever format they need.

All axes always have a special meaning (e.g. feature dim or time dim) or dimension tag (e.g. for time axes, including dyn seq lengths). If you need to change the meaning (and not actually transpose / swap axes), you need to use ReinterpretDataLayer.

See also TransposeLayer for a more generic variant.

See also ReinterpretDataLayer, which does not swap/transpose axes, but allows to reinterpret their meaning / dim tags.

Parameters:
  • axis1 (int|str) –
  • axis2 (int|str) –
layer_class = 'swap_axes'[source]
classmethod get_out_data_from_opts(name, sources, axis1, axis2, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis1 (int|str) –
  • axis2 (int|str) –
Return type:

Data

class returnn.tf.layers.basic.TransposeLayer(perm, **kwargs)[source]

Basically a wrapper around tf.transpose(). Note that usually, this should not be needed, and it is recommended not to be used, as this will be unnecessarily inefficient. Normally, all RETURNN layers will automatically transpose the input data into whatever format they need.

All axes always have a special meaning (e.g. feature dim or time dim) or dimension tag (e.g. for time axes, including dyn seq lengths). If you need to change the meaning (and not actually transpose / swap axes), you need to use ReinterpretDataLayer.

See also ReinterpretDataLayer, which does not transpose axes, but allows to reinterpret their meaning / dim tags.

Parameters:perm (dict[str,str]) – target axis -> source axis
layer_class = 'transpose'[source]
classmethod transpose(input_data, perm, name=None)[source]
Parameters:
  • input_data (Data) –
  • perm (dict[str,str]) –
  • name (str|str) –
Returns:

transposed data

Return type:

Data

classmethod get_perm_int(input_data, perm)[source]
Parameters:
  • input_data (Data) –
  • perm (dict[str,str]) –
Return type:

dict[int,int]

classmethod get_out_data_from_opts(name, sources, perm, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • perm (dict[str,str]) – target axis -> source axis
Return type:

Data

class returnn.tf.layers.basic.ReinterpretDataLayer(switch_axes=None, size_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]

Acts like the CopyLayer but reinterprets the role of some axes or data.

Parameters:
  • switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes
  • size_base (LayerBase|None) – copy the size_placeholder from the given layer
  • set_axes (dict[str,int|str]) – This can be used to overwrite the special axes like time_dim_axis or feature_dim_axis. For that, use keys “B”,”T” or “F”, and a value via Data.get_axis_from_description().
  • set_dim_tags (dict[str|DimensionTag,DimensionTag]) – axis -> new dim tag. assigns new dim tags. If the dim tag is yet undefined, this will not use same_dim_tags_as (declare_same_as) but create a new dim tag. This option is useful for generalized self attention (https://github.com/rwth-i6/returnn/issues/391).
  • enforce_batch_major (bool) –
  • enforce_time_major (bool) –
  • set_sparse (bool|None) – if bool, set sparse value to this
  • set_sparse_dim (int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse
  • increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse
layer_class = 'reinterpret_data'[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, switch_axes=None, size_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes
  • size_base (LayerBase|None) – similar as size_target
  • set_axes (dict[str,int]) –
  • set_dim_tags (dict[str|DimensionTag,DimensionTag]) –
  • enforce_batch_major (bool) –
  • enforce_time_major (bool) –
  • set_sparse (bool|None) – if bool, set sparse value to this
  • set_sparse_dim (int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse
  • increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse
class returnn.tf.layers.basic.ConvLayer(n_out, filter_size, padding, strides=1, dilation_rate=1, groups=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, auto_use_channel_first=False, with_bias=<class 'returnn.util.basic.NotSpecified'>, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, filter=None, filter_perm=None, bias=None, **kwargs)[source]

A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.

Parameters:
  • n_out (int) – number of outgoing features
  • filter_size (tuple[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.
  • padding (str) – “same” or “valid”
  • strides (int|tuple[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
  • dilation_rate (int|tuple[int]) – dilation for the spatial dims
  • groups (int) – grouped convolution
  • input_expand_dims (int) – number of dynamic dims to add to the input
  • input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.
  • input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.
  • auto_use_channel_first (bool) – convert the input to NCHW or not
  • with_bias (bool|NotSpecified) – if True, will add a bias to the output features. False by default
  • activation (None|str) – if set, will apply this function at the end
  • filter (LayerBase|None) – if given, will not create an own parameter, but use this as the filter
  • filter_perm (dict[str,str]|None) – transposes the filter (input filter as layer)
  • bias (LayerBase|None) – if given, will not create an own parameter, but use this as the bias
layer_class = 'conv'[source]
recurrent = True[source]
classmethod calc_out_dim(in_dim, filter_size, stride, padding, dilation_rate=1)[source]
Parameters:
  • in_dim (int|tf.Tensor|T) – dimension in some axis
  • filter_size (int) – e.g. 2, for the corresponding axis
  • stride (int) – e.g. 1, for the corresponding axis
  • dilation_rate (int) – e.g. 1
  • padding (str) – “valid” or “same”
Returns:

the output dimension

Return type:

T

classmethod get_out_data_from_opts(name, n_out, filter_size, padding, strides=1, dilation_rate=1, sources=(), input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, auto_use_channel_first=False, **kwargs)[source]
Parameters:
  • name (str) –
  • n_out (int) –
  • filter_size (tuple[int]) –
  • padding (str) –
  • strides (int|tuple[int]) –
  • dilation_rate (int|tuple[int]) –
  • sources (list[LayerBase]|tuple[LayerBase]) –
  • input_expand_dims (int) – number of dynamic dims to add to the input
  • input_add_feature_dim (bool) –
  • input_split_feature_dim (None|int) –
  • auto_use_channel_first (bool) –
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, use_channel_first=False, **kwargs)[source]

A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.

Parameters:
  • mode (str) – “max” or “avg”
  • pool_size (tuple[int]) – shape of the window of each reduce
  • padding (str) – “valid” or “same”
  • dilation_rate (tuple[int]|int) –
  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
  • use_channel_first (bool) – if set, will transform input to NCHW format
layer_class = 'pool'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, pool_size, strides=None, dilation_rate=1, sources=(), padding='VALID', use_channel_first=False, **kwargs)[source]
Parameters:
  • name (str) –
  • pool_size (tuple[int]|list[int]) –
  • strides (tuple[int]|list[int]|int) –
  • dilation_rate (int|tuple[int]|list[int]) –
  • sources (list[LayerBase]) –
  • padding (str) –
  • use_channel_first (bool) –
Return type:

Data

class returnn.tf.layers.basic.DctLayer(type=2, n=None, norm=None, **kwargs)[source]

Layer to perform DCT Wraps tf.signal.dct(). For further documentation on the input arguments, refer to https://www.tensorflow.org/api_docs/python/tf/signal/dct

Parameters:
  • type (int) – DCT type to perform. Must be 1, 2, 3, or 4
  • n (int|None) – length of the transform
  • norm (str|None) – normalization to apply. Must be None or “ortho”
layer_class = 'dct'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.TransposedConvLayer(filter_size, activation, strides=None, padding='same', remove_padding=0, output_padding=None, with_bias=True, forward_weights_init='glorot_uniform', bias_init=0.0, filter=None, filter_perm=None, bias=None, **kwargs)[source]

Transposed convolution, sometimes also called deconvolution. See tf.nn.conv2d_transpose() (currently we support 1D/2D).

Parameters:
  • filter_size (list[int]) –
  • strides (list[int]|None) – specifies the upscaling. by default, same as filter_size
  • padding (str) – “same” or “valid”
  • remove_padding (list[int]|int) –
  • output_padding (list[int|None]|int|None) –
  • with_bias (bool) – whether to add a bias. enabled by default. Note that the default is different from ConvLayer!
  • activation (str|None) –
  • forward_weights_init
  • bias_init
  • filter (LayerBase|None) – if given, will not create an own parameter, but use this as the filter
  • filter_perm (dict[str,str]|None) – transposes the filter (input filter as layer)
  • bias (LayerBase|None) – if given, will not create an own parameter, but use this as the bias
layer_class = 'transposed_conv'[source]
recurrent = True[source]
static deconv_output_length(input_length, filter_size, padding, output_padding=None, stride=0, dilation=1)[source]

Determines output length of a transposed convolution given input length. Copied from conv_utils.deconv_output_length, adapted with simplification.

Parameters:
  • input_length (T|int|tf.Tensor) –
  • filter_size (int) –
  • padding (str) – one of “same”, “valid”, “full”.
  • output_padding (int|None) – amount of padding along the output dimension. Can be set to None in which case the output length is inferred.
  • stride (int) –
  • dilation (int) –
Returns:

The output length (integer)

Return type:

T

classmethod get_out_data_from_opts(name, sources, n_out, filter_size, strides=None, padding='same', remove_padding=0, output_padding=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • n_out (int) –
  • filter_size (list[int]) –
  • strides (list[int]|None) –
  • padding (str) –
  • remove_padding (list[int]|int) –
  • output_padding (list[int|None]|int|None) –
Return type:

Data

get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.basic.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None, **kwargs)[source]

This reduces some axis by using “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.

Parameters:
  • mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”
  • axes (int|list[int]|str) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See Data.get_axes_from_description().
  • axis (int|list[int]|str) – for compatibility, can be used instead of axes
  • keep_dims (bool) – if dimensions should be kept (will be 1)
  • enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.
  • use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.
layer_class = 'reduce'[source]
classmethod reduce(input_data, mode, axes=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None)[source]
Parameters:
  • input_data (Data) –
  • mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”
  • axes (int|list[int]|str) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See Data.get_axes_from_description().
  • keep_dims (bool) – if dimensions should be kept (will be 1)
  • enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.
  • use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.
Return type:

tf.Tensor

classmethod need_enforce_batch_dim_axis(axes)[source]
Parameters:axes (int|list[int]|str) –
Returns:if any integer is in axes, thus we should have a fixed dimension layout
Return type:bool
classmethod get_axes(axis, input_data)[source]
Parameters:
  • axis – see self.__init__()
  • input_data (Data) –
Returns:

list of axes

Return type:

list[int]

classmethod get_out_data_from_opts(name, sources, mode='', axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • mode (str) – (default here “” because other code uses this function)
  • axes (str|list[str]|None) –
  • axis (str|None) –
  • keep_dims (bool) –
  • enforce_batch_dim_axis (int|None) –
Return type:

Data

class returnn.tf.layers.basic.ReduceOutLayer(mode, num_pieces, **kwargs)[source]

Combination of SplitDimsLayer applied to the feature dim and ReduceLayer applied to the resulting feature dim. This can e.g. be used to do maxout.

Parameters:
  • mode (str) – “sum” or “max” or “mean”
  • num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.
layer_class = 'reduce_out'[source]
classmethod get_out_data_from_opts(num_pieces, sources, name, **kwargs)[source]
Parameters:
  • num_pieces (int) –
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

class returnn.tf.layers.basic.SqueezeLayer(axis, enforce_batch_dim_axis=None, allow_no_op=False, **kwargs)[source]

Removes an axis with dimension 1. This is basically a wrapper around tf.squeeze.

Parameters:
  • axis (int|list[int]|str) – one axis or multiple axis to squeeze. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
  • enforce_batch_dim_axis (int|None) –
  • allow_no_op (bool) –
layer_class = 'squeeze'[source]
classmethod get_out_data_from_opts(axis, enforce_batch_dim_axis=None, allow_no_op=False, sources=(), **kwargs)[source]
Parameters:
  • axis (int|list[int]|str) –
  • enforce_batch_dim_axis (int|None) –
  • allow_no_op (bool) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.StackLayer(axis=None, **kwargs)[source]

Stacks multiple inputs together using tf.stack().

Parameters:axis (int|None) – new axis. If not given, will use Data.get_default_new_axis_for_dim_tag(<spatial>), i.e. some reasonable default for a new spatial axis.
layer_class = 'stack'[source]
classmethod get_out_data_from_opts(name, sources, axis=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (int|None) –
Return type:

Data

class returnn.tf.layers.basic.WeightedSumLayer(axes, padding=None, size=None, keep_dims=None, **kwargs)[source]

Calculates a weighted sum, either over a complete axis of fixed dimension, or over some window. Can also do that for multiple axes. The weights are a trainable parameter matrix. Similar would be to use ElemwiseProdLayer and ReduceLayer, or just a DotLayer with a VariableLayer. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – the axes to do the weighted-sum over
  • padding (str) – “valid” or “same”, in case of keep_dims=True
  • size (None|tuple[int]) – the kernel-size. if left away, the axes must be of fixed dimension, and we will use keep_dims=False, padding=”valid” by default. Otherwise, if given, you must also provide padding and keep_dims=True by default.
  • keep_dims (bool) – if False, the axes will be squeezed away. see also size.
layer_class = 'weighted_sum'[source]
classmethod get_out_data_from_opts(name, sources, axes, padding=None, size=None, keep_dims=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axes (str|list[str]) –
  • padding (str|None) –
  • size (None|tuple[int]) –
  • keep_dims (bool|None) –
Return type:

Data

class returnn.tf.layers.basic.ElemwiseProdLayer(axes, size=None, **kwargs)[source]

Element-wise product in some axes. Microsoft calls this “static attention”, in Deep Conv. NN with Layer-wise Context Expansion and Attention (LACE). The matrix/tensor to be used for the product are given as a trainable parameter. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – e.g. “spatial”, but all those axes must be of fixed dimension
  • size (tuple[int]) – for double-checking, you can explicitly provide the size
layer_class = 'elemwise_prod'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.PrefixInTimeLayer(prefix=0.0, repeat=1, size_base=None, **kwargs)[source]

Adds some prefix in time dimension. This is kind of the reverse of SliceNdLayer does.

Parameters:
  • prefix (float|str) – either some constant or another layer
  • repeat (int|LayerBase) – how often to repeat the postfix
  • size_base (LayerBase|None) – copy seq-lens from here
layer_class = 'prefix_in_time'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, sources, size_base=None, repeat=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • size_base (LayerBase|None) –
  • repeat (LayerBase|int|None) –
Return type:

Data

class returnn.tf.layers.basic.PostfixInTimeLayer(postfix=0.0, repeat=1, **kwargs)[source]

Adds some postfix in time dimension.

Parameters:
  • postfix (float|int|LayerBase) – constant or other layer without time axis to use as postfix
  • repeat (int) – how often to repeat the postfix
layer_class = 'postfix_in_time'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, postfix=0.0, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • postfix (float|int|LayerBase) – constant or other layer without time axis to use as postfix
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
get_dep_layers()[source]
Return type:list[LayerBase]
class returnn.tf.layers.basic.TimeChunkingLayer(chunk_size, chunk_step, **kwargs)[source]

Performs chunking in time. See TFNativeOp.chunk().

Parameters:
  • chunk_size (int) –
  • chunk_step (int) –
layer_class = 'time_chunking'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.TimeUnChunkingLayer(chunking_layer, **kwargs)[source]

Performs chunking in time. See TFNativeOp.chunk().

Parameters:chunking_layer (TimeChunkingLayer) –
layer_class = 'time_unchunking'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.DotLayer(red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, debug=False, **kwargs)[source]

This performs a dot-product of two sources. The underlying matmul expects shapes (shared…, I, J) * (shared…, J, K) -> (shared…, I, K). We say that J is the axis to be reduced, I is the var-dim of source 1, and K is the var-dim of source 2. I, J, K can also be multiple axes from the sources. The var-dims don’t need to exist. All other axes (shared…) are expected to match.

Parameters:
  • red1 (str|int|tuple[str|int]|list[str|int]) – reduce axes of first source
  • red2 (str|int|tuple[str|int]|list[str|int]) – reduce axes of second source
  • var1 (str|int|tuple[str|int]|list[str|int]|None) – var axes of first source
  • var2 (str|int|tuple[str|int]|list[str|int]|None) – var axes of second source
  • add_var2_if_empty (bool) – if var2=None, add dim=1 at the end
  • debug (bool) – will print debug shapes, etc.
layer_class = 'dot'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, sources, red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • red1 (str|int|tuple[str|int]|list[str|int]) – reduce axes of first source
  • red2 (str|int|tuple[str|int]|list[str|int]) – reduce axes of second source
  • var1 (str|int|tuple[str|int]|list[str|int]|None) – var axes of first source
  • var2 (str|int|tuple[str|int]|list[str|int]|None) – var axes of second source
  • add_var2_if_empty (bool) –
Return type:

Data

class returnn.tf.layers.basic.ShiftAxisLayer(axis, amount, pad=True, adjust_size_info=True, **kwargs)[source]

Shifts the dimensions in an axis around. This layer may change the axis-dimension.

This name might be confusing. No axis will be shifted here. See SwapAxesLayer for that.

Parameters:
  • axis (str|int) – single axis to shift
  • amount (int) – number of elements to shift (<0 for left-shift, >0 for right-shift)
  • pad (bool) – preserve shape by padding
  • adjust_size_info (bool) – whether to adjust the size_placeholder
layer_class = 'shift_axis'[source]
classmethod get_out_data_from_opts(name, amount, axis, pad, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • amount (int) –
  • axis (str) –
  • pad (bool) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.ResizeLayer(factor, axis, kind='nn', fill_value=None, fill_dropout=None, **kwargs)[source]

Resizes the input, i.e. upsampling or downsampling. Supports different kinds, such as linear interpolation or nearest-neighbor.

Parameters:
  • factor (int) –
  • axis (str|int) – the axis to resize, counted with batch-dim. can also be “T” for time
  • kind (str) – “linear”, “nn”/”nearest_neighbor”, “cubic”, “fill”
  • fill_value (None|int|float) – if kind==”fill”
  • fill_dropout (float) – if set, will dropout in the same axis
layer_class = 'resize'[source]
classmethod get_out_data_from_opts(factor, axis, sources, name, **kwargs)[source]
Parameters:
  • factor (int) –
  • axis (str) –
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

class returnn.tf.layers.basic.CombineDimsLayer(**kwargs)[source]

Combines multiple dimensions. See also MergeDimsLayer. This is deprecated in favor of MergeDimsLayer.

Parameters:axes (int|list[int]|str) – one axis or multiple axis to reduce. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
layer_class = 'combine_dims'[source]
classmethod get_out_data_from_opts(**kwargs)[source]
Return type:Data
class returnn.tf.layers.basic.RemoveLayer(symbol, **kwargs)[source]

Currently, assumes sparse data, and removes a specific symbol from the data.

It is recommended to use MaskedComputationLayer in combination with e.g. a :class:CompareLayer` instead, as this provides more flexibility.

Parameters:symbol (int) –
layer_class = 'remove'[source]
classmethod get_out_data_from_opts(name, sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.CombineLayer(kind, sources, activation=None, with_bias=False, eval=None, eval_locals=None, eval_for_output_loss=False, **kwargs)[source]

Applies a binary operation, such as addition, to all sources while accumulating the partial results. In the first step, the binary operation is performed on the first two sources. After the first step, the previous results is always the left-hand operator.

Its basic working is similar to the reduce function used in functional programming. Also see ActivationLayer, or CompareLayer.

Parameters:
  • kind (str) – currently accepted values are average, add, sub, mul, truediv, logical_and, logical_or, or eval
  • sources (list[LayerBase]) –
  • activation (str|None) – if provided, activation function to apply, e.g. “tanh” or “relu”
  • with_bias (bool) – if given, will add a trainable bias tensor
  • eval (str|callable) – for kind=”eval”, will eval this string. or function. see _op_kind_eval()
  • eval_locals (dict[str]|None) – locals for eval
  • eval_for_output_loss (bool) – will do the same eval on layer.output_loss
layer_class = 'combine'[source]
classmethod get_out_data_from_opts(eval_locals=None, n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, sources=(), **kwargs)[source]
Parameters:
  • eval_locals (dict[str]|None) – locals for eval, will also pass to out_type is out_type is a function
  • n_out (int|None|NotSpecified) –
  • out_type (dict[str]|None|(()->Data)) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.EvalLayer(eval, **kwargs)[source]

Evaluates some string. The CombineLayer provides this functionality, thus this is just a special case of it. Also see ActivationLayer, or CompareLayer.

The output type is defined as a broadcasted extension of all sources. You can overwrite it by (partially) specifying out_type. out_type can also be a generic Python function, returning a Data instance.

Parameters:eval (str) – will eval this string. see _op_kind_eval()
layer_class = 'eval'[source]
class returnn.tf.layers.basic.CompareLayer(kind='equal', value=None, **kwargs)[source]

Compares element-wise the tokens of all input sequences among themselves and/or with a specified given value. The comparisons are performed in a chain according to the order in which they are listed.

Example:

{"class": "compare", "from": ["i1", "i2"], "value": val, "kind": "less"}

computes i1 < i2 < val and it is true only if the whole chain of operations is true. The final result is the logical “and” of all comparisons. Note that value is the last element to be compared to.

A common example usage is the end layer in a rec subnetwork to specify the stopping criterion, e.g. the last generated token is equal to the end-of-sentence token:

"output": {"class": "rec", "from": [], "unit": {
    .
    .
    .
    "end": {"class": "compare", "from": "output", "value": end_of_sentence_id}
}, "target": "classes0"}
Parameters:
  • kind (str) – which comparison operation to use, e.g. “equal”, “greater”, “less” or other supported TF comparison ops
  • value (float|int|None) – if specified, will also compare to this
layer_class = 'compare'[source]
classmethod get_out_data_from_opts(n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, sources=(), **kwargs)[source]
Parameters:
  • n_out (int|None|NotSpecified) –
  • out_type (dict[str]|None) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.SwitchLayer(condition, true_from, false_from, **kwargs)[source]

Wrapper around tf.where() (or more generically TFUtil.where_bc()), or statically choose a single source if the condition is a callable (…)->bool. (tf.cond is not useful here, as the sources would have been already constructed and computed.)

This layer is also useful for applying any kind of generic masking to the frames. E.g. one could have a layer called “mask” computing a boolean mask for the values stored in another layer “input”. Then use this layer with condition=”mask”, true_from=”input”, false_from=mask_value, to mask out all frames where the mask is false with the mask_value.

See also CondLayer. See also SeqLenMaskLayer if you just want to mask using the sequence lengths.

Parameters:
  • condition (LayerBase|bool) – if callable, expected to be (…)->bool, and called in transform_config_dict
  • true_from (LayerBase|float|int|None) –
  • false_from (LayerBase|float|int|None) –
layer_class = 'switch'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, condition, true_from, false_from, **kwargs)[source]
Parameters:
  • name (str) –
  • condition (LayerBase|bool) –
  • true_from (LayerBase|float|int|None) –
  • false_from (LayerBase|float|int|None) –
Return type:

Data

get_dep_layers()[source]
Return type:list[LayerBase]
class returnn.tf.layers.basic.CondLayer(condition, true_layer, false_layer, _condition_network=None, _true_layer_network=None, _false_layer_network=None, **kwargs)[source]

See also SwitchLayer, which uses tf.where(). Here, we use tf.cond instead. I.e. the condition has to be a scalar bool, and only the corresponding true/false branch is computed.

Parameters:
  • condition (LayerBase|dict[str]) –
  • true_layer (LayerBase|dict[str]) –
  • false_layer (LayerBase|dict[str]) –
layer_class = 'cond'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(true_layer, false_layer, name, network, **kwargs)[source]
Parameters:
Return type:

Data

get_sub_layers()[source]
Return type:list[LayerBase]
class returnn.tf.layers.basic.SearchSortedLayer(sorted_sequence, values, axis='T', side='left', **kwargs)[source]

Basically wraps tf.searchsorted().

Takes a tensor sorted_sequence that is sorted along one axis, and a tensor values. Will compute an output tensor with the same axes as values, where each entry is the index of the value within the sorted sequence. All (batch) axes of sorted_sequence except for the axis it is sorted along must be present in values.

Parameters:
  • sorted_sequence (LayerBase) –
  • values (LayerBase) – search values
  • axis (str) – the axis along which sorted_sequence is sorted
  • side (str) – “left” or “right”. When one of the values exactly matches an element of the sorted_sequence, whether to choose the lower or higher index.
layer_class = 'search_sorted'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(sorted_sequence, values, axis, name, network, **kwargs)[source]
Parameters:
Return type:

Data

class returnn.tf.layers.basic.SubnetworkLayer(subnetwork, _subnet, _output, concat_sources=True, load_on_init=None, dropout=0, dropout_noise_shape=None, _parent_layer_cache=None, _from=None, **kwargs)[source]

You can define a whole subnetwork as a single layer by this class.

The subnetwork will be specified by a dict[str,dict[str]], just like a normal network is specified in the config.

The "output" layer of the subnetwork will be the output of this subnetwork-layer.

With concat_sources=True (default),
the input to this layer will be represented as the "data:data" or simply "data" in the subnetwork,
otherwise with concat_sources=False,
the input to this layer will be represented as "data:input_layer_name" and also "data:0" to "data:<n-1>" for n inputs, for each input, in the subnetwork. The first input will also be simply available as "data:data"/``”data”`.
Parameters:
  • subnetwork (dict[str,dict]) – subnetwork as dict (JSON content). must have an “output” layer-
  • concat_sources (bool) – if we concatenate all sources into one, like it is standard for most other layers
  • load_on_init (str|dict[str]|None) – if provided, for parameter initialization, we will load the given model file. see CustomCheckpointLoader.
  • dropout (float) – will be applied if train_flag is set
  • dropout_noise_shape (tuple|list|dict|None) –
  • _parent_layer_cache (dict[str,LayerBase]|None) –
  • _subnet (returnn.tf.network.Subnetwork) –
  • _output (LayerBase) –
layer_class = 'subnetwork'[source]
recurrent = True[source]
update_params_from_subnet()[source]

Update self.params.

update_rec_vars_outputs()[source]

Update self.rec_vars_outputs.

update_load_on_init()[source]

Handle load_on_init.

classmethod get_out_data_from_opts(n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, **kwargs)[source]
Parameters:
  • n_out (int|None|NotSpecified) –
  • out_type (dict[str]|None) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod cls_get_sub_network(name, network, layer_desc)[source]
Parameters:
Return type:

returnn.tf.network.Subnetwork|None

get_sub_layer(layer_name)[source]
Parameters:layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)
Returns:the sub_layer addressed in layer_name or None if no sub_layer exists
Return type:LayerBase|None
get_sub_networks()[source]
Return type:list[returnn.tf.network.TFNetwork]
get_sub_layers()[source]
Return type:list[LayerBase]
get_dep_layers()[source]
Returns:list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc.
Return type:list[LayerBase]
get_last_hidden_state(key)[source]
Parameters:key (int|str|None) – also the special key “*”
Return type:tf.Tensor|None
classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, **kwargs)[source]
Parameters:
Return type:

dict[str,tf.Tensor]

classmethod get_rec_initial_extra_outputs_shape_invariants(**kwargs)[source]
Returns:optional shapes for the tensors by get_rec_initial_extra_outputs
Return type:dict[str,tf.TensorShape]
class returnn.tf.layers.basic.VariableLayer(shape, dtype='float32', add_batch_axis=True, add_time_axis=False, trainable=True, init=0, **kwargs)[source]

Represents a variable. Can add batch/time dimension if wanted. Can be trainable. See defaults.

Parameters:
  • shape (tuple[int]|list[int]) –
  • dtype (str) –
  • add_batch_axis (bool) –
  • add_time_axis (bool) –
  • trainable (bool) –
  • init (str|float|int) – see TFUtil.get_initializer()
layer_class = 'variable'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, network, shape, dtype='float32', add_batch_axis=True, add_time_axis=False, **kwargs)[source]
Parameters:
  • name (str) –
  • network (returnn.tf.network.TFNetwork) –
  • shape (tuple[int]|list[int]) –
  • dtype (str) –
  • add_batch_axis (bool) –
  • add_time_axis (bool) –
Return type:

Data

class returnn.tf.layers.basic.AccumulateMeanLayer(exp_average, axes='bt', initial_value=None, is_prob_distribution=None, **kwargs)[source]

Accumulates the mean of the input (in training) (over batch-dim and time-dim by default). It’s similar to ReduceLayer

Parameters:
  • exp_average (float) – momentum in exponential average calculation
  • axes (int|list[str]|str) – the axes to reduce. must contain batch and time.
  • initial_value (float) – how to initialize the variable which accumulates the mean
  • is_prob_distribution (bool) – if provided, better default for initial_value
layer_class = 'accumulate_mean'[source]
classmethod get_out_data_from_opts(axes='bt', **kwargs)[source]
Parameters:axes (str) –
Return type:Data
class returnn.tf.layers.basic.LossLayer(loss_, target_=None, use_error=False, **kwargs)[source]

This layers wraps a Loss calculation as a layer. I.e. the loss will be calculated and returned by the layer. But this loss will not be used as a loss by the updater. If you want to use it as a loss, you can use the AsIsLoss, i.e. write "loss": "as_is".

Note that the loss options for the wrapped loss need to be provided via loss_opts_, and it does not apply any reduce function.

Note

The LossLayer might be deprecated in the future in favor of implementing the losses as actual layers.

If you want to define a loss inside the network, it is recommended to define it explicitly. An example could be:

"se_loss": {"class": "eval", "eval": "(source(0) - source(1)) ** 2", "from": ["output", "data:classes"]}

Followed by an e.g. mean reduce if needed:

"mse_loss": {"class": "reduce", "mode": "mean", "axis": "F", "from": "se_loss"}

loss_ and related params have the postfix _ to distinguish them from the loss options, which are used by the network and updater for training. Some of these (e.g. loss_opts_) are handled in transform_config_dict().

Parameters:
  • loss (Loss) –
  • target (LayerBase|None) –
  • use_error (bool) – whether to output the loss error instead of the loss value
layer_class = 'loss'[source]
recurrent = True[source]
get_sub_layer(layer_name)[source]
Parameters:layer_name (str) – sub layer name
Return type:LayerBase|None
classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]
Parameters:
  • layer_name (str) – sub layer name
  • parent_layer_kwargs (dict[str]) –
Return type:

(Data, TFNetwork, type)|None

get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, target_=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • target (LayerBase|None) –
Return type:

Data

class returnn.tf.layers.basic.ForcedAlignmentLayer(align_target, topology, input_type, **kwargs)[source]

Calculates a forced alignment, via Viterbi algorithm.

Parameters:
  • align_target (LayerBase) –
  • topology (str) – e.g. “ctc” or “rna” (RNA is CTC without label loop)
  • input_type (str) – “log_prob” or “prob”
layer_class = 'forced_align'[source]
classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]
Parameters:
  • layer_name (str) – sub layer name
  • parent_layer_kwargs (dict[str]) –
Return type:

(Data, TFNetwork, type)|None

get_sub_layer(layer_name)[source]
Parameters:layer_name (str) –
Return type:LayerBase|None
get_dep_layers()[source]
Return type:list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.FastBaumWelchLayer(align_target, align_target_key=None, ctc_opts=None, sprint_opts=None, input_type='log_prob', tdp_scale=1.0, am_scale=1.0, min_prob=0.0, staircase_seq_len_source=None, **kwargs)[source]

Calls fast_baum_welch() or fast_baum_welch_by_sprint_automata(). We expect that our input are +log scores, e.g. use log-softmax.

Parameters:
  • align_target (str) – e.g. “sprint” or “staircase”
  • align_target_key (str|None) – e.g. “classes”, used for e.g. align_target “ctc”
  • ctc_opts (dict[str]) – used for align_target “ctc”
  • sprint_opts (dict[str]) – used for Sprint (RASR) for align_target “sprint”
  • input_type (str) – “log_prob” or “prob”
  • tdp_scale (float) –
  • am_scale (float) –
  • min_prob (float) – clips the minimum prob (value in [0,1])
  • staircase_seq_len_source (LayerBase|None) –
layer_class = 'fast_bw'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.SyntheticGradientLayer(gradient, meta_loss_scale=1.0, **kwargs)[source]

This is a generalized way to be able to replace the true gradient with any kind of predicted gradient. This enabled to implement the idea from here:

Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343
Parameters:
  • gradient (LayerBase) –
  • meta_loss_scale (float) –
layer_class = 'synthetic_gradient'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(sources, name, **kwargs)[source]
Parameters:
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

class returnn.tf.layers.basic.TikhonovRegularizationLayer(meta_loss_scale=1.0, **kwargs)[source]

Adds the Tikhonov regularization as a meta-loss (see TFUtil.MetaLosses).

Parameters:meta_loss_scale (float) –
layer_class = 'tikhonov_regularization'[source]
class returnn.tf.layers.basic.FramewiseStatisticsLayer(sil_label_idx, histogram_num_bins=20, **kwargs)[source]

Collects various statistics (such as FER, etc) on the sources. The tensors will get stored in self.stats which will be collected by TFEngine.

layer_class = 'framewise_statistics'[source]
classmethod get_out_data_from_opts(**kwargs)[source]
Return type:Data
class returnn.tf.layers.basic.PrintLayer(summarize=99, extra_print_args=(), **kwargs)[source]

Prints the sources to console/log, via TFUtil.py_print().

Parameters:
  • summarize (int|None) – passed to py_print()
  • extra_print_args (list|tuple) –
layer_class = 'print'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

class returnn.tf.layers.basic.HDFDumpLayer(filename, extra=None, dump_whole_batches=False, labels=None, extend_existing_file=False, dump_per_run=False, **kwargs)[source]

Dumps into HDF file, compatible to HDFDataset.

The HDF will be written to disk under the specified filename, if there was no error, by default at graph reset, via TFNetwork.register_graph_reset_callback(). Or after the dataset iteration run loop, with dump_per_run, via TFNetwork.register_run_finished_callback().

Common usage would be to add this to your network with “is_output_layer”: True, such that you don’t need to make other layers depend on it.

It currently uses SimpleHDFWriter internally.

Parameters:
  • filename (str|(()->str)) –
  • extra (None|dict[str,LayerBase]) –
  • dump_whole_batches (bool) – dumps the whole batch as a single sequence into the HDF
  • labels (list[str]|None) –
  • extend_existing_file (bool) – True also means we expect that it exists
  • dump_per_run (bool) – write via TFNetwork.register_run_finished_callback()
layer_class = 'hdf_dump'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
class returnn.tf.layers.basic.ImageSummaryLayer(max_outputs=3, **kwargs)[source]

Creates image summaries which can be viewed in TensorBoard. This layer expects the source to be in (T-decoder, T-encoder, B, 1).

Parameters:max_outputs – number of images to generate per step
layer_class = 'image_summary'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(**kwargs)[source]
Return type:Data
class returnn.tf.layers.basic.CrossEntropyLoss(focal_loss_factor=0.0, label_smoothing=0.0, label_smoothing_gaussian=False, debug_dump=False, safe_log_opts=None, use_fused=True, fake_upper_bound=None, **kwargs)[source]

Cross-Entropy loss. Basically sum(target * log(output)).

Parameters:
  • focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled
  • label_smoothing (float) – 0.1 is a common default. see TFUtil.smoothing_cross_entropy()
  • label_smoothing_gaussian (bool) – see TFUtil.smoothing_cross_entropy()
  • debug_dump (bool) –
  • safe_log_opts (dict[str]) – passed to safe_log()
  • use_fused (bool) – if possible, use fused opts
  • fake_upper_bound (float|None) – uses TFUtil.minimum_with_identity_grad(). I.e. you will see a finite loss, but we use the original gradient (which should be safe).
class_name = 'ce'[source]
need_target = True[source]
get_output_target_scores()[source]
Returns:shape (time_flat,), type float32
Return type:tf.Tensor
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.BinaryCrossEntropyLoss(pos_weight=None, **kwargs)[source]

Binary cross entropy. We expect the output as logits, not in probability space! Per frame: mean(target * log(sigmoid(output)) + (1 - target) * log(1 - sigmoid(output)))

Parameters:pos_weight (float|None) – weight of positive labels, see tf.nn.weighted_cross_entropy_with_logits.
class_name = 'bin_ce'[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Returns:frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)
Return type:tf.Tensor
class returnn.tf.layers.basic.GenericCELoss(**kwargs)[source]

Some generalization of cross entropy.

class_name = 'generic_ce'[source]
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.CtcLoss(target_collapse_repeated=False, auto_clip_target_len=False, output_in_log_space=False, beam_width=100, ctc_opts=None, use_native=False, use_viterbi=False, **kwargs)[source]

Connectionist Temporal Classification (CTC) loss. Basically a wrapper around tf.nn.ctc_loss.

Parameters:
  • target_collapse_repeated (bool) – like preprocess_collapse_repeated option for CTC. used for sparse_labels().
  • auto_clip_target_len (bool) – see self._get_target_sparse_labels().
  • output_in_log_space (bool) – False -> output expected in prob space. see self.get_output_logits
  • beam_width (int) – used in eval
  • ctc_opts (dict[str]|None) – other kwargs used for tf.nn.ctc_loss
  • use_native (bool) – use our native implementation (TFNativeOp.ctc_loss())
  • use_viterbi (bool) – instead of full-sum, use only best path (via ctc_loss_viterbi())
class_name = 'ctc'[source]
recurrent = True[source]
init(**kwargs)[source]

See super.

get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_soft_alignment()[source]

Also called the Baum-Welch-alignment. This is basically p_t(s|x_1^T,w_1^N), where s are the output labels (including blank), and w are the real target labels.

Returns:shape (time, batch, dim)
Return type:tf.Tensor
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor
classmethod get_auto_output_layer_dim(target_dim)[source]
Return type:int
class returnn.tf.layers.basic.EditDistanceLoss(debug_print=False, label_map=None, ctc_decode=False, output_in_log_space=False, **kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics.

Parameters:
  • debug_print (bool) – will tf.Print the sequence
  • label_map (dict[int,int]|None) – before calculating the edit-distance, will apply this map
  • ctc_decode (bool) – True -> expects dense output and does CTC decode, False -> expects sparse labels in output
  • output_in_log_space (bool) – False -> dense output expected in prob space. see self.get_output_logits
class_name = 'edit_distance'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor
get_value()[source]
Return type:None
class returnn.tf.layers.basic.BleuLoss(**kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics. Also, BLEU is a score, i.e. the higher, the better. Thus, to interpret it as a loss or error, we take the negative value.

class_name = 'bleu'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_error()[source]
Return type:tf.Tensor
get_value()[source]
Return type:None
class returnn.tf.layers.basic.ExpectedLoss(loss, loss_kind, norm_scores=True, norm_scores_stop_gradient=True, divide_beam_size=True, subtract_average_loss=True, loss_correction_grad_only=False, **kwargs)[source]

This loss uses another loss error or value and given the search beam scores, calculates the expected loss. Sometimes also called minimum Bayes risk.

Parameters:
  • loss (Loss) –
  • loss_kind (str) – “error” or “value”. whether to use loss.get_error() or loss.get_value()
  • norm_scores (bool) –
  • norm_scores_stop_gradient (bool) –
  • divide_beam_size (bool) –
  • subtract_average_loss (bool) –
  • loss_correction_grad_only (bool) –
class_name = 'expected_loss'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
init(**kwargs)[source]

Overwrites super. Get search choices.

get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:None
class returnn.tf.layers.basic.DeepClusteringLoss(embedding_dimension, nr_of_sources, **kwargs)[source]

Cost function used for deep clustering as described in [Hershey & Chen+, 2016]: “Deep clustering discriminative embeddings for segmentation and separation”

Parameters:
  • embedding_dimension (int) –
  • nr_of_sources (int) –
class_name = 'deep_clustering'[source]
get_error()[source]
Returns:frame error rate as a scalar value
Return type:tf.Tensor | None
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.L1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

L1-distance loss. sum(target - output).

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'l1'[source]
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.MeanSquaredError(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

The generic mean squared error loss function

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'mse'[source]
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.MeanL1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

Like MSE loss, but with absolute difference

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'mean_l1'[source]
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.ExternSprintLoss(sprint_opts, **kwargs)[source]

The loss is calculated by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'sprint'[source]
recurrent = True[source]
need_target = False[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None
class returnn.tf.layers.basic.FastBaumWelchLoss(sprint_opts, **kwargs)[source]

The loss is calculated via fast_baum_welch(). The automata are created by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'fast_bw'[source]
recurrent = True[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None
class returnn.tf.layers.basic.ViaLayerLoss(error_signal_layer=None, align_layer=None, loss_wrt_to_act_in=False, **kwargs)[source]

The loss error signal and loss value is defined as the output of another layer. That way, you can define any custom loss. This could e.g. be used together with the fast_bw layer.

Parameters:
  • error_signal_layer (LayerBase) –
  • align_layer (LayerBase) –
  • loss_wrt_to_act_in (bool|str) – if True, we expect that the given output_with_activation is set, and the given error signal is w.r.t. the input of the specific activation function. A common example is the input to the softmax function, where the gradient is much more stable to define, e.g. y - z instead of y/z for cross entropy. If you specify a str, e.g. “softmax” or “log_softmax”, there is an additional check that the used activation function is really that one.
class_name = 'via_layer'[source]
recurrent = True[source]
need_target = False[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None
class returnn.tf.layers.basic.AsIsLoss(**kwargs)[source]

Use the output as-is as the loss.

class_name = 'as_is'[source]
need_target = False[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:None
class returnn.tf.layers.basic.SearchScoreLoss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

Use the scores from SearchChoices.

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'search_score'[source]
need_target = False[source]
reduce_to_batch(loss, normalize)[source]
Parameters:
  • loss (tf.Tensor) – (batch,)
  • normalize (bool) – reduce mean instead of reduce sum
Returns:

(batch,)

Return type:

tf.Tensor

get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:None
class returnn.tf.layers.basic.SamplingBasedLoss(num_sampled=128, num_splits=1, sampler='log_uniform', nce_loss=False, use_full_softmax=False, remove_accidental_hits=None, sampler_args=None, nce_log_norm_term=0.0, **kwargs)[source]

Implement two sampling based losses, sampled softmax (default) and noise contrastive estimation. https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss. https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss.

Must be used in an output linear layer with a weight matrix of shape (num_classes, dim). When using ‘log_uniform’ sampler (default), optimal performance is typically achieved with the vocabulary list sorted in decreasing order of frequency (https://www.tensorflow.org/api_docs/python/tf/random/log_uniform_candidate_sampler).

Parameters:
  • num_sampled (int) – Number of classes to be sampled. For sampled softmax, this is the number of classes to be used to estimate the sampled softmax. For noise contrastive estimation, this is the number of noise samples.
  • num_splits (int) – Number of different samples (each with ‘num_sampled’ classes) to be used per batch.
  • sampler (str) – Specify sampling distribution (“uniform”, “log_uniform”, “learned_unigram” or “fixed_unigram”).
  • nce_loss (bool) – If True, use noise contrastive estimation loss. Else (default), use the sampled softmax.
  • use_full_softmax (bool) – If True, compute the full softmax instead of sampling (can be used for evaluation).
  • remove_accidental_hits (bool|None) – If True, remove sampled classes that equal one of the target classes. If not specified (None), the value is determined based on the choosen objective. For sampled softmax this should be set to True; for NCE the default is False. Set this to True in case of NCE training and the objective is equal to sampled logistic loss.
  • sampler_args (dict[str]) – additional arguments for the candidate sampler. This is most relevant to the fixed_unigram sampler. See https://www.tensorflow.org/api_docs/python/tf/random/fixed_unigram_candidate_sampler for details.
  • nce_log_norm_term (float) – The logarithm of the constant normalization term for NCE.
class_name = 'sampling_loss'[source]
get_value()[source]
Return type:tf.Tensor
class returnn.tf.layers.basic.TripletLoss(margin, multi_view_training=False, **kwargs)[source]

Triplet loss: loss = max(margin + d(x_a, x_s) - d(x_a, x_d), 0.0) Triplet loss is used for metric learning in a siamese/triplet network. It should be used as a part of CopyLayer with 3 inputs corresponding to

x_a, x_s and x_d in a loss.
Here we assume that x_a are anchor samples, x_s are samples where
at each position i in a minibatch x_ai and x_si belong to the same class, while pairs x_ai and x_di belong to different classes.

In this implementation the number of training examples is increased by extracting all possible same/different pairs within a minibatch.

class_name = 'triplet_loss'[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_value()[source]
Return type:tf.Tensor
get_error()[source]

Error is not defined for triplet_loss :return: None

returnn.tf.layers.basic.get_loss_class(loss)[source]
Parameters:loss (str) – loss type such as “ce”
Return type:(() -> Loss) | type[Loss] | Loss
returnn.tf.layers.basic.auto_register_layer_classes(vars_values)[source]

Example usage:

from returnn.tf.layers.basic import auto_register_layer_classes
auto_register_layer_classes('extern_private/your_stuff/CoolThingy.py')
Parameters:vars_values (list|types.ModuleType|str) – e.g. use list(globals().values()). str is considered as a module-filename
Returns:nothing
returnn.tf.layers.basic.register_layer_class(layer_class)[source]

Registers a layer class such that it can be used in network construction.

Parameters:layer_class (type[LayerBase]) –
Returns:nothing
returnn.tf.layers.basic.get_layer_class(name)[source]
Parameters:name (str) – matches layer_class
Return type:(() -> LayerBase) | type[LayerBase] | LayerBase
returnn.tf.layers.basic.get_layer_class_name_list()[source]
Return type:list[str]