Shape and Type Modification

Cast Layer

class returnn.tf.layers.basic.CastLayer(dtype, output, **kwargs)[source]

Cast to some other dtype.

Parameters:
  • dtype (str)

  • output (Data)

layer_class: Optional[str] = 'cast'[source]
classmethod get_out_data_from_opts(dtype, **kwargs)[source]
Parameters:

dtype (str)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Expand Dimensions Layer

class returnn.tf.layers.basic.ExpandDimsLayer(axis, dim=1, **kwargs)[source]

Adds some axis.

Parameters:
  • axis (str|int) – axis to add, e.g. “F”|”feature” or “spatial”|”time”|”T”. if this is an integer, the input data is first converted into batch-major mode, and then this is counted with batch-dim.

  • dim (int|Dim) – dimension of new axis (1 by default)

layer_class: Optional[str] = 'expand_dims'[source]
classmethod get_out_data_from_opts(name, axis, dim=1, sources=(), **kwargs)[source]
Parameters:
  • name (str)

  • axis (str|int)

  • dim (int|Dim)

  • sources (list[LayerBase])

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Gather Layer

class returnn.tf.layers.basic.GatherLayer(position: LayerBase | int, axis: Dim | str, clip_to_valid: bool = False, **kwargs)[source]

Gathers slices on a specified axis from the input layer using indices from a position layer. If the input is a layer of the shape [B,D,F1], and position of shape [B,F2], this will yield output of the shape [B,F2,F1] where

output[b,f2,f1] = input[b,position[b,f2],f1]

(if D is the axis to gather from). In general, all shared axes of the input and the positions will be considered as batch-axes.

The position argument can also be an int. In this case, this simply gives input[position] one the specified axis.

It’s basically a wrapper around tf.gather. It provides the same functionality as the deprecated GatherNdLayer, but is more generic. See also GatherNdLayer.

Parameters:
  • position – indices used to select the slices of the input from. If another layer, must be of type int32 or int64. Can also specify a constant int.

  • axis – The axis into which we gather the indices into

  • clip_to_valid – if True, the indices will be clipped to the valid range of the input Also taking seq lengths into account.

layer_class: Optional[str] = 'gather'[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod get_out_data_from_opts(name, sources, position, axis, **kwargs)[source]
Parameters:
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Merge Dimensions Layer

class returnn.tf.layers.basic.MergeDimsLayer(axes, keep_order=<class 'returnn.util.basic.NotSpecified'>, n_out=None, out_dim=None, **kwargs)[source]

Merges a list of axes into a single one. (Flatten the dims.) E.g. input is (batch, width, height, dim) and axes=(1,2), then we get (batch, width*height, dim). Or input is (batch, time, height, dim) and axes=”except_time”, then we get (batch, time, height*dim). See also CombineDimsLayer. When batch and time got merged, SplitBatchTimeLayer can undo this. When you want to merge batch and time, but remove the padding efficiently, i.e. flatten it, see FlattenBatchLayer.

Parameters:
  • axes (Sequence[Dim|str]) – see Data.get_axis_from_description()

  • keep_order (bool|NotSpecified) – The old default was: the axes are sorted, and then merged. Thus, the order of incoming axes will influence the result. E.g. inputs [B,S,F] and [B,F,S], with axes=["S","F"], will get different results, although the output shape is [B,S*F] in both cases. This is bad: In general, other layers in RETURNN might reorder the axes for various reasons, and all layers should behave in the same way, no matter the order. It is recommended to set keep_order=True, such that the order defined in axes defines the behavior, and not the incoming axis order. Since behavior version 6, this is already the case.

  • n_out (int|None)

  • out_dim (Dim|None)

layer_class: Optional[str] = 'merge_dims'[source]
classmethod get_out_data_from_opts(name, axes, keep_order=<class 'returnn.util.basic.NotSpecified'>, sources=(), n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, out_dim=None, **kwargs)[source]
Parameters:
Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Length Layer

class returnn.tf.layers.basic.LengthLayer(axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]

Returns the length of sources as (B,), via input size_placeholder.

Parameters:
  • axis (str|Dim)

  • add_time_axis (bool) – should not be used

  • dtype (str)

  • sparse (bool)

layer_class: Optional[str] = 'length'[source]
classmethod fixup_dim(dim, sources)[source]
Parameters:
Return type:

Dim

classmethod get_out_data_from_opts(name, sources, axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • axis (str|Dim)

  • add_time_axis (bool)

  • dtype (str)

  • sparse (bool)

Return type:

Data

kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Pad Layer

class returnn.tf.layers.basic.PadLayer(*, axes: Dim | str | Sequence[Dim | str], padding: int | Tuple[int, int] | Sequence[Tuple[int, int]], out_dims: Dim | Sequence[Dim] | None = None, handle_dynamic_dims: bool | None = None, value: int | float = 0, mode: str = 'constant', **kwargs)[source]

Adds (e.g. zero) padding in some axis or axes. Also see PrefixInTimeLayer for dynamic dims.

Parameters:
  • axes – e.g. “F” etc. see Data.get_axes_from_description().

  • padding – how much to pad left/right in each axis

  • out_dims

  • handle_dynamic_dims – True: when doing right padding on a dynamic dim, value will be added after the seq end, not at the end of the dimension. False: value will be added at the end of the dimension. By default, in behavior version >=21, this is True, in older versions, this is False.

  • value – what constant value to pad, with mode==”constant”

  • mode – “constant”, “reflect”, “symmetric” and “replication”

layer_class: Optional[str] = 'pad'[source]
classmethod get_out_data_from_opts(name, sources, axes, padding, out_dims=None, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • axes (Dim|str|list[Dim|str])

  • padding (list[(int,int)]|(int,int)|int)

  • out_dims (Dim|list[Dim]|None)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Postfix (in Time) Layer

class returnn.tf.layers.basic.PostfixInTimeLayer(axis='T', out_dim=None, postfix=0.0, repeat=1, **kwargs)[source]

Adds some postfix in time dimension. Also see PrefixInTimeLayer.

Parameters:
  • axis (Dim|str)

  • out_dim (Dim|None)

  • postfix (float|int|LayerBase) – constant or other layer without time axis to use as postfix

  • repeat (int) – how often to repeat the postfix

layer_class: Optional[str] = 'postfix_in_time'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, axis='T', out_dim=None, postfix=0.0, repeat=1, **kwargs)[source]
Parameters:
  • axis (Dim|str)

  • out_dim (Dim|None)

  • name (str)

  • sources (list[LayerBase])

  • postfix (float|int|LayerBase) – constant or other layer without time axis to use as postfix

  • repeat (int)

Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
get_dep_layers()[source]
Return type:

list[LayerBase]

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Prefix (in Time) Layer

class returnn.tf.layers.basic.PrefixInTimeLayer(axis='T', out_dim=None, prefix=0.0, repeat=1, size_base=None, **kwargs)[source]

Adds some prefix in time dimension. This is kind of the reverse of SliceNdLayer does. Also see PadLayer for static dimensions. Also see PostfixInTimeLayer.

Parameters:
  • axis (Dim|str)

  • out_dim (Dim|None)

  • prefix (float|str) – either some constant or another layer

  • repeat (int|LayerBase) – how often to repeat the prefix

  • size_base (LayerBase|None) – copy seq-lens from here

layer_class: Optional[str] = 'prefix_in_time'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork)

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

classmethod get_out_data_from_opts(name, sources, axis='T', out_dim=None, size_base=None, repeat=1, **kwargs)[source]
Parameters:
Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Reinterpret Data Layer

class returnn.tf.layers.basic.ReinterpretDataLayer(switch_axes=None, size_base=None, batch_dim_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]

Acts like the CopyLayer but reinterprets the role of some axes or data.

Parameters:
  • switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes

  • size_base (LayerBase|None) – copy the size_placeholder from the given layer

  • batch_dim_base (LayerBase|None) – copy the batch dim from this layer

  • set_axes (dict[str,Dim|str|None]) – This can be used to overwrite the special axes like time_dim_axis or feature_dim_axis. For that, use keys “B”,”T” or “F”, and a value via Data.get_axis_from_description().

  • set_dim_tags (dict[str|Dim,Dim]|Sequence[Tuple[Dim,Dim]]|None) – axis -> new dim tag. assigns new dim tags. If the passed dim tag is yet undefined, this will not use same_dim_tags_as (declare_same_as) but create a new dim tag. This option is useful for generalized self attention (https://github.com/rwth-i6/returnn/issues/391).

  • enforce_batch_major (bool)

  • enforce_time_major (bool)

  • set_sparse (bool|None) – if bool, set sparse value to this

  • set_sparse_dim (Dim|int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse

  • increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse

layer_class: Optional[str] = 'reinterpret_data'[source]
output_loss: Optional[tf.Tensor][source]
output_before_activation: Optional[OutputWithActivation][source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, switch_axes=None, size_base=None, batch_dim_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes

  • size_base (LayerBase|None) – similar as size_target

  • batch_dim_base (LayerBase|None)

  • set_axes (dict[str,Dim|str|None])

  • set_dim_tags (dict[str|Dim,Dim]|Sequence[Tuple[Dim,Dim]]|None)

  • enforce_batch_major (bool)

  • enforce_time_major (bool)

  • set_sparse (bool|None) – if bool, set sparse value to this

  • set_sparse_dim (Dim|int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse

  • increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Repeat Layer

class returnn.tf.layers.basic.RepeatLayer(repetitions, axis='T', out_dim=None, **kwargs)[source]

A wrapper around tf.repeat, but supports an additional batch axis for the durations The sum of the repetitions has to be non-zero for each sequence in the batch.

This layer can only be used with Tensorflow 1.15.0 or newer.

Parameters:
  • repetitions (LayerBase|int) – number of repetitions for each sequence and position in target axis. Can be [B,T] or [T,B] or some subset of that shape

  • axis (Dim|str) – (dynamic) axis for repetition (currently only time axis is supported)

  • out_dim (Dim|None)

layer_class: Optional[str] = 'repeat'[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, axis, repetitions, out_dim=None, **kwargs)[source]
Parameters:
Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Resize Layer

class returnn.tf.layers.basic.ResizeLayer(factor, axis, out_dim=None, kind='nn', fill_value=None, fill_dropout=None, **kwargs)[source]

Resizes the input, i.e. upsampling or downsampling. Supports different kinds, such as linear interpolation or nearest-neighbor.

Parameters:
  • factor (int|float|LayerBase) – out_len = in_len * factor

  • axis (Dim|str) – the axis to resize

  • out_dim (Dim|None)

  • kind (str) – “linear”, “nn”/”nearest_neighbor”, “cubic”, “fill”

  • fill_value (None|int|float) – if kind==”fill”

  • fill_dropout (float|None) – if set, will dropout in the same axis

layer_class: Optional[str] = 'resize'[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(factor, axis, sources, name, fill_dropout=None, out_dim=None, **kwargs)[source]
Parameters:
  • factor (int|float|LayerBase)

  • axis (Dim|str)

  • sources (list[LayerBase])

  • name (str)

  • fill_dropout (float|None)

  • out_dim (Dim|None)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Scatter n-dim Layer

class returnn.tf.layers.basic.ScatterNdLayer(position, position_axis, output_dim_via_time_from=None, out_spatial_dim=None, filter_invalid_indices=False, **kwargs)[source]

The inverse of GatherNdLayer. Mostly a wrapper for tf.scatter_nd.

Note that “nd” is maybe a bit misleading. While we operate on N-D tensors, the indices (position) are into a single new dimension.

The input to the layer are the updates, the indices are via the position argument. The indices are into the newly constructed output dimension. The output shape is constructed via the common shape of the input, the position, and the unique common axis (if not unique, we would need to introduce an option to specify it) is replaced by the given output dimension (currently via output_dim_via_time_from).

Examples:

position (indices): (B,eTs)
input (updates): (eTs,D) or (B,eTs,D) -> expanded to (B,eTs,D)
output shape: (B,eT,D)

position (indices): (B,dT,eTs)
input (updates): (eTs,D) -> expanded to (B,dT,eTs,D)
output shape: (B,dT,eT,D)

position (indices): (dT,eTs)
input (updates): (eTs,D) -> expanded to (dT,eTs,D)
output shape: (dT,eTs,D)

position (indices): (dT,eTs)
input (updates): (B,eTs,D) -> expanded to (dT,eTs,B,D)
output shape: (dT,eT,B,D)

In all these examples, output_dim_via_time_from is (B,eT,F), and eTs gets replaced by eT.

Parameters:
  • position (LayerBase) – indices into first axis (excluding batch) of the output

  • position_axis (Dim|str) – axis in position to replace by the output-dim

  • output_dim_via_time_from (LayerBase|None) – use the time-dim from this layer as the output-dim

  • out_spatial_dim (Dim|None)

  • filter_invalid_indices (bool) – allow for indices <0 or >= output_dim, which will be discarded in the output

layer_class: Optional[str] = 'scatter_nd'[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod get_out_data_from_opts(name, sources, position, position_axis, output_dim_via_time_from=None, out_spatial_dim=None, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • position (LayerBase)

  • position_axis (Dim|str) – axis in position to replace by the output-dim

  • output_dim_via_time_from (LayerBase|None) – use the time-dim from this layer as the output-dim

  • out_spatial_dim (Dim|None)

Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

ShiftAxisLayer

class returnn.tf.layers.basic.ShiftAxisLayer(axis, amount, pad=True, pad_value=0, adjust_size_info=True, **kwargs)[source]

Shifts the dimensions in an axis around by slicing and optional padding. This layer may change the axis-dimension.

This name might be confusing. No axis will be shifted here. See SwapAxesLayer for that.

Also see SliceLayer.

Parameters:
  • axis (str|Dim|int) – single axis to shift

  • amount (int) – number of elements to shift (<0 for left-shift, >0 for right-shift)

  • pad (bool) – preserve shape by padding

  • pad_value (int|float|bool) – padding value

  • adjust_size_info (bool) – whether to adjust the size_placeholder

layer_class: Optional[str] = 'shift_axis'[source]
classmethod get_out_data_from_opts(name, sources, amount, axis, pad=True, adjust_size_info=True, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • amount (int)

  • axis (str)

  • pad (bool)

  • adjust_size_info (bool)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Slice Layer

class returnn.tf.layers.basic.SliceLayer(axis, slice_start=None, slice_end=None, slice_step=None, out_dim=None, **kwargs)[source]

Slicing on the input, i.e. x[start:end:step] in some axis. See also SliceNdLayer, for variable start. See also GatherLayer, for one single position.

Note that __getitem__ on a TF tensor (or also Numpy ND array) is more generic, and supports slices in multiple axes, as well as adding new dimensions, etc. It even allows to get boolean values, and then applies a boolean mask. See TF _slice_helper (== tf.Tensor.__getitem__) for a generic implementation, which calls tf.strided_slice. If we ever need such more generic support, we might consider adding a new layer, like GenericSliceLayer, which gets a splice_spec, just like _slice_helper (argument to __getitem__). But any such a slice can already be constructed with multiple individual layers, which perform individual slices (per axis).

We just support slicing in a single axis here, with optional striding (slice_step).

Parameters:
  • axis (Dim|str)

  • axis_kind (str|None) – “T” for time, “B” for batch, “F” for feature

  • slice_start (int|None)

  • slice_end (int|None)

  • slice_step (int|None)

  • out_dim (Dim|None)

layer_class: Optional[str] = 'slice'[source]
classmethod get_out_data_from_opts(name, axis, sources=(), slice_start=None, slice_end=None, slice_step=None, out_dim=None, **kwargs)[source]
Parameters:
  • name (str)

  • axis (Dim|str)

  • sources (list[LayerBase])

  • slice_start (int|None)

  • slice_end (int|None)

  • slice_step (int|None)

  • out_dim (Dim|None)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Slice n-dim Layer

class returnn.tf.layers.basic.SliceNdLayer(size, start=None, min_size=None, axis='T', out_spatial_dim=None, **kwargs)[source]

This takes out a slice-range from the time axis, e.g. x[start:start + size]. If the input is of shape (B,T,F) and start is of shape (B,), then the output will be of shape (B,size,F). If the input is of shape (B,T,F) and start is of shape (B,T), then the output will be of shape (B,T,size,F). This layer allows a different start slice point for each batch, in contrast to SliceLayer, and the start is variable. See also GatherNdLayer. PrefixInTimeLayer can recover the original shape (by zero-padding).

Parameters:
  • start (int|LayerBase|None) – (B,…)

  • size (int|LayerBase|Dim|None) – We assume that this is >=0. If this might not be the case, use min_size=0. If None, it uses the max possible size, and it becomes a dynamic axis.

  • min_size (int|None) – if size is None, but we want to have a min-size

  • axis (Dim|str)

  • out_spatial_dim (Dim|None)

layer_class: Optional[str] = 'slice_nd'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod get_out_data_from_opts(name, sources=(), start=None, size=None, axis='T', out_spatial_dim=None, **kwargs)[source]
Parameters:
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Split Batch Time Layer

class returnn.tf.layers.basic.SplitBatchTimeLayer(base, **kwargs)[source]

A very specific layer which expects to get input of shape (batch * time, …) and converts it into (batch, time, …), where it recovers the seq-lens from some other layer. See SplitDimsLayer for a more generic layer.

Parameters:

base (LayerBase) – used to recover the seq-lens

layer_class: Optional[str] = 'split_batch_time'[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, base, sources=(), **kwargs)[source]
Parameters:
Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Split Dimensions Layer

class returnn.tf.layers.basic.SplitDimsLayer(axis, dims, pad_to_multiples=None, pad_value=0, **kwargs)[source]

Splits one axis into multiple axes. E.g. if you know that your feature-dim is composed by a window, i.e. the input is (batch, time, window * feature), you can set axis=”F”, dims=(window, -1), and you will get the output (batch, time, window, feature).

If the split axis has a dynamic length, exactly one of the axes that we split into need to also have a dynamic length. You can e.g. use this to split the input dimension into smaller “chunks” of a fixed window size. E.g. you could have input (batch, time, feature) and set axis=”T”, dims=(-1, window), to get output (batch, split_time, window, feature). In this case, the exact sequence lengths are lost and everything is padded to multiples of the window size using the given padding value. Use ReinterpretDataLayer to receive back the original sequence lengths after merging.

Also see SplitBatchTimeLayer. Also see MergeDimsLayer which can undo this operation.

Parameters:
  • axis (Dim|str) – e.g. “F”

  • dims (tuple[Dim|int]|list[Dim|int]) – what the axis should be split into. e.g. (window, -1)

  • pad_to_multiples (bool|None) – If true, input will be padded to the next multiple of the product of the static dims, such that splitting is actually possible. By default this is done iff the axis has a dynamic size

  • pad_value (int|float) – What pad value to use for pad_to_multiples

layer_class: Optional[str] = 'split_dims'[source]
classmethod get_out_data_from_opts(name, axis, dims, pad_to_multiples=None, sources=(), **kwargs)[source]
Parameters:
  • name (str)

  • axis (Dim|str)

  • dims (list[Dim|int]|tuple[Dim|int])

  • pad_to_multiples (bool|None)

  • sources (list[LayerBase])

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Squeeze Layer

class returnn.tf.layers.basic.SqueezeLayer(axis, enforce_batch_dim_axis=None, allow_no_op=False, **kwargs)[source]

Removes an axis with dimension 1. This is basically a wrapper around tf.squeeze.

Parameters:
  • axis (Dim|int|list[int]|str) – one axis or multiple axis to squeeze. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”

  • enforce_batch_dim_axis (int|None)

  • allow_no_op (bool)

layer_class: Optional[str] = 'squeeze'[source]
classmethod get_out_data_from_opts(axis, enforce_batch_dim_axis=None, allow_no_op=False, sources=(), **kwargs)[source]
Parameters:
  • axis (Dim|int|list[int]|str)

  • enforce_batch_dim_axis (int|None)

  • allow_no_op (bool)

  • sources (list[LayerBase])

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Stack Layer

class returnn.tf.layers.basic.StackLayer(axis=None, out_spatial_dim=None, **kwargs)[source]

Stacks multiple inputs together using tf.stack(). This creates a new dimension for the stack.

For concatenation (in feature dimension), see CopyLayer.

Parameters:
  • axis (int|None) – new axis. If not given, will use Data.get_default_new_axis_for_dim_tag(<spatial>), i.e. some reasonable default for a new spatial axis.

  • out_spatial_dim (Dim|None)

layer_class: Optional[str] = 'stack'[source]
classmethod get_out_data_from_opts(name, sources, axis=None, out_spatial_dim=None, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • axis (int|None)

  • out_spatial_dim (Dim|None)

Return type:

Data

kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Swap Axes Layer

class returnn.tf.layers.basic.SwapAxesLayer(axis1, axis2, **kwargs)[source]

Swaps two axes. Basically a wrapper around returnn.tf.util.basic.swapaxes(). Note that usually, this should not be needed, and it is recommended not to be used, as this will be unnecessarily inefficient. Normally, all RETURNN layers will automatically transpose the input data into whatever format they need.

All axes always have a special meaning (e.g. feature dim or time dim) or dimension tag (e.g. for time axes, including dyn seq lengths). If you need to change the meaning (and not actually transpose / swap axes), you need to use ReinterpretDataLayer.

See also TransposeLayer for a more generic variant.

See also ReinterpretDataLayer, which does not swap/transpose axes, but allows to reinterpret their meaning / dim tags.

Parameters:
  • axis1 (int|str)

  • axis2 (int|str)

layer_class: Optional[str] = 'swap_axes'[source]
classmethod get_out_data_from_opts(name, sources, axis1, axis2, **kwargs)[source]
Parameters:
  • name (str)

  • sources (list[LayerBase])

  • axis1 (int|str)

  • axis2 (int|str)

Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Time Chunking Layer

class returnn.tf.layers.basic.TimeChunkingLayer(chunk_size, chunk_step, axis='T', out_dim=None, **kwargs)[source]

Performs chunking in time. See returnn.tf.native_op.chunk(). See also WindowLayer and TimeUnChunkingLayer. It’s very similar to WindowLayer, but we have this case more optimized, and also it modifies the batch dim. The output is of shape (chunk_size, n_batch * n_chunks, …).

Parameters:
  • chunk_size (int) – chunk size or window size

  • chunk_step (int) – chunk step or striding

  • axis (Dim|str)

  • out_dim (Dim|None)

layer_class: Optional[str] = 'time_chunking'[source]
recurrent = True[source]
input_data: Optional[Data][source]
classmethod get_out_data_from_opts(name, sources, axis='T', out_dim=None, **kwargs)[source]
Parameters:
Return type:

Data

kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Time Un-Chunking Layer

class returnn.tf.layers.basic.TimeUnChunkingLayer(chunking_layer, **kwargs)[source]

Performs chunking in time. See TFNativeOp.chunk(). See TimeChunkingLayer.

Parameters:

chunking_layer (TimeChunkingLayer)

layer_class: Optional[str] = 'time_unchunking'[source]
recurrent = True[source]
get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, chunking_layer, **kwargs)[source]
Parameters:
Return type:

Data

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

Window Layer

class returnn.tf.layers.basic.WindowLayer(window_size=None, window_dim=None, window_left=None, window_right=None, axis='T', out_spatial_dim=None, padding='same', stride=1, _use_opt_dim_order=None, **kwargs)[source]

Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. In PyTorch, this is called unfold. We sometimes call this “chunking”. There is also the similar TimeChunkingLayer.

E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}.

Use stride==window_size and window_right=window_size - 1 in combination with a MergeDimsLayer to achieve feature stacking with right-hand zero padding.

This is not to take out a single window from the time-dimension. See SliceLayer or SliceNdLayer.

The inverse layer is FoldLayer.

Parameters:
  • window_size (int|None)

  • window_dim (Dim|None)

  • window_left (int|None)

  • window_right (int|None)

  • axis (Dim|str) – see Data.get_axis_from_description()

  • out_spatial_dim (Dim|None)

  • padding (str) – “same” or “valid”

  • stride (int) – return only each Nth window

  • _use_opt_dim_order (bool|None)

layer_class: Optional[str] = 'window'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, network, sources, window_size=None, window_dim=None, axis='T', out_spatial_dim=None, padding='same', stride=1, _use_opt_dim_order=None, **kwargs)[source]
Parameters:
Return type:

Data

classmethod get_rec_initial_extra_outputs(network, batch_dim, rec_layer, window_size=None, window_dim=None, axis='T', sources=(), **kwargs)[source]
Parameters:
Return type:

dict[str,tf.Tensor]

input_data: Optional[Data][source]
kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]