returnn.tf.layers.basic
#
Many canonical basic layers.
- class returnn.tf.layers.basic.SourceLayer(network, data_key=None, sources=(), **kwargs)[source]#
This gives access to some entry from network.extern_data (
ExternData
).- Parameters:
network (returnn.tf.network.TFNetwork) –
data_key (str|None) –
sources (tuple) –
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- classmethod get_out_data_from_opts(network, data_key=None, **kwargs)[source]#
- Parameters:
network (returnn.tf.network.TFNetwork) –
data_key (str|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- returnn.tf.layers.basic.concat_sources(src_layers, out_dim=None, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>)[source]#
- Parameters:
src_layers (list[LayerBase]) –
out_dim (Dim|None) –
allow_broadcast_all_sources (bool|NotSpecified) –
- Returns:
data with placeholders set
- Return type:
Data
- returnn.tf.layers.basic.get_concat_sources_data_template(src_layers, out_dim=None, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, name=None)[source]#
This just creates a template
Data
instance, without creating any real TF tensors.concat_sources()
(and related) are the equivalent functions which would create aData
together with the tensor.- Parameters:
src_layers (Sequence[LayerBase]) –
out_dim (Dim|None) –
allow_broadcast_all_sources (bool|NotSpecified) –
name (str|None) – name of the Data
- Returns:
data with no placeholders set. it is always a copy or new instance, so safe to manipulate
- Return type:
Data
- returnn.tf.layers.basic.concat_sources_with_opt_dropout(src_layers, out_dim=None, dropout=0, dropout_axis=None, dropout_noise_shape=None, dropout_on_forward=False, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>)[source]#
Concatenates in the feature dim (see
concat_sources()
), and then optionally applies dropout.- Parameters:
src_layers (list[LayerBase]) –
out_dim (Dim|None) –
dropout (float) – dropout rate that will be applied if train_flag is set or dropout_on_forward is enabled
dropout_noise_shape (tuple|list|dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – provide 1 for broadcasting or None otherwise for each axis. The default “None” will broadcast across all dynamic axes including the batch axis. Use {“*”: None} to disable broadcasting for all axes.
dropout_on_forward (bool) – apply dropout also during inference
allow_broadcast_all_sources (bool|NotSpecified) –
- Returns:
data with placeholders set
- Return type:
Data
- class returnn.tf.layers.basic.CopyLayer(in_dim=None, out_dim=None, extra_deps=(), **kwargs)[source]#
This layer does nothing, it copies its input. This is not even a
tf.identity
. It refers to the same TF tensor. If multiple sources are provided, they are concatenated in the feature-dim.- Parameters:
in_dim (Dim|None) – just for checking. but also, if this is provided, it will set the feature_dim to this.
out_dim (Dim|None) – alternative to in_dim. see in_dim doc.
extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via
SelectSearchSourcesLayer
. We only have this here for theCopyLayer
because theget_out_data_from_opts()
must know about it and define the right beam. Also see the optioncollocate_with
, which is different in that it does not add a dependency. Note that this will not be real TF control dependencies, but it simply sets the dependency on the layer. If you want to have a real TF control dependency, useIdentityLayer
.
- output_before_activation: Optional[OutputWithActivation][source]#
- classmethod get_out_data_from_opts(name, sources=(), extra_deps=(), out_type=None, in_dim=None, out_dim=None, n_out=<class 'returnn.util.basic.NotSpecified'>, out_shape=None, **kwargs)[source]#
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.IdentityLayer(sources: List[LayerBase], control_dependencies: Sequence[LayerBase] | None = None, **kwargs)[source]#
Wraps
tf.identity
with potential control dependencies.The difference to
CopyLayer
is that this creates a new TF op (tf.identity
), which allows for potential control dependencies. This is the whole purpose of this layer.Usually the arguments, when specified in the network dict, are going through
transform_config_dict()
, before they are passed to here. SeeTFNetwork.construct_from_dict()
.- Parameters:
name (str) –
network (returnn.tf.network.TFNetwork) –
output (Data) – Set a specific output instead of using
get_out_data_from_opts()
n_out (NotSpecified|None|int) – output dim
out_dim (returnn.tensor.Dim|None) – output feature dim tag
out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
out_shape (set[returnn.tensor.Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See
Data.verify_out_shape()
.sources (list[LayerBase]) – via self.transform_config_dict()
in_dim (returnn.tensor.Dim|None) – input feature dim tag
target (str|list[str]|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target). alternatively, this also can be a layer name.
_target_layers (dict[str,LayerBase]|None) – if target.startswith(“layer:”), then this is target -> layer
size_target (str|None) – like target but this is only used to set our output size in case of training
loss (Loss|None) – via
transform_config_dict()
. Every layer can have one loss (of typeLoss
), or none loss. In the net dict, it is specified as a string. InTFNetwork
, all losses from all layers will be collected. That is whatTFUpdater.Updater
will use for training.reuse_params (ReuseParams|None) – if given, will opt reuse the params. see
self.var_creation_scope()
. See also thename_scope
option as an alternative.name_scope (str|None) – If set, uses this custom (relative) name scope. If it starts with a “/”, it will be the absolute name scope. It should not end with a “/”. It can be empty, in which case it will not consume a new name scope. This can also be used for parameter sharing. The default is the layer name in most cases, but this logic is in
get_absolute_name_scope_prefix()
andTFNetwork.layer_creation_scope()
.param_device (str|None) – e.g. “CPU”, etc. any valid name for tf.device. see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/device_name_utils.h
L2 (float|None) – for constraints
darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
spatial_smoothing (float|None) – see
returnn.tf.util.basic.spatial_smoothing_energy()
param_variational_noise (float|None) – adds variational noise to the params during training
param_dropout (float|None) – dropout on params (weight dropout) during training
param_dropout_min_ndim (int|None) – if param dropout is enabled, only use if for params whose ndim >= this. E.g. it might make sense to disable it for bias params or scalars, so set param_dropout_min_ndim=2.
updater_opts (dict[str]|None) – accepts similar opts as TFUpdater, e.g. “optimizer”, “learning_rate”, …
is_output_layer (bool|None) – triggers the construction of this layer in the root net. Inside a
RecLayer
, it triggers the explicit accumulation of all frames. Also see theneed_last
option.only_on_eval (bool) – if True, this layer will only be calculated in eval
only_on_search (bool) – if True, this layer will only be calculated when search is done
copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
batch_norm (bool|dict) – see self.batch_norm()
initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
state – explicitly defines the rec state. initial_state would define the initial state (in the first frame)
need_last (bool) – Inside
RecLayer
, make sure that we can access the last frame. Similar to ``is_output_layer, but this is specifically about the last frame, i.e. it does not trigger accumulation.rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us. You would not explicitly set this in a config. This is automatically, internally, via
RecLayer
.encapsulate (bool) –
mostly relevant for SubnetworkLayer and similar: If True, all sub layers will be created,
and covered in functions like
get_rec_initial_extra_outputs()
, and the logic incls_get_sub_network()
will not be used.If False, the logic in
cls_get_sub_network()
will be used.collocate_with (list[str]|None) – in the rec layer, collocate with the specified other layers
trainable (bool) – whether the parameters of this layer will be trained. Default is True. However, if this is inside a subnetwork, all the parent layers must be set to trainable, otherwise the parameters will not be trainable.
custom_param_importer (str|callable|None) – used by
set_param_values_by_dict()
register_as_extern_data (str|None) – registers output in network.extern_data
control_dependencies_on_output (None|((LayerBase)->list[tf.Operation])) – This is mostly to perform some checks after the layer output has been computed, before the layer output is used anywhere else. There is also the
IdentityLayer
with the optioncontrol_dependencies
.debug_print_layer_output (None|bool|dict[str]) – same as global config option but per layer
_name (str) – just for internal construction, should be the same as
name
_network (returnn.tf.network.TFNetwork) – just for internal construction, should be the same as
network
_src_common_search_choices (None|SearchChoices) – set via
SearchChoices.translate_to_common_search_beam()
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ConcatLayer(sources, allow_broadcast=False, out_dim=None, **kwargs)[source]#
Concatenates the inputs in specified axes. This generalizes
CopyLayer
which concatenates in the feature dim.- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.DropoutLayer(in_dim=None, out_dim=None, extra_deps=(), **kwargs)[source]#
Just the same as
CopyLayer
, because that one already supports dropout.- Parameters:
in_dim (Dim|None) – just for checking. but also, if this is provided, it will set the feature_dim to this.
out_dim (Dim|None) – alternative to in_dim. see in_dim doc.
extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via
SelectSearchSourcesLayer
. We only have this here for theCopyLayer
because theget_out_data_from_opts()
must know about it and define the right beam. Also see the optioncollocate_with
, which is different in that it does not add a dependency. Note that this will not be real TF control dependencies, but it simply sets the dependency on the layer. If you want to have a real TF control dependency, useIdentityLayer
.
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ScaledGradientLayer(scale, shift=None, scale_shift_by_sum_over_axis=None, clip_max_axis=None, **kwargs)[source]#
Just
tf.identity()
in the forward pass. Scales the gradient by some factor in backprop. Can be used as gradient reversal layer (with negative factor). Usesreturnn.tf.util.basic.scaled_gradient()
, ortf.stop_gradient()
- Parameters:
scale (float|LayerBase) – if 0. and no shift, will use tf.stop_gradient
shift (float|LayerBase|None) –
scale_shift_by_sum_over_axis (Dim|str|None) – if given, calculates the sum over this axis (absolute values) and multiplies the shift value by this sum.
clip_max_axis (Dim|str|None) – if given, clips the gradient to the max value in this axis before the transformation, for all values in the axis
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SelectSearchSourcesLayer(search_choices_layer, sources, **kwargs)[source]#
Selects the corresponding search beams from the source, given current search choices (determined by a layer). Like
InternalLayer
, only for internal purpose at the moment.- classmethod select_if_needed(layer, search_choices)[source]#
- Parameters:
layer (LayerBase) –
search_choices (SearchChoices|None) –
- Return type:
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ActivationLayer(activation, opts=None, **kwargs)[source]#
This layer just applies an activation function. See
returnn.tf.util.basic.get_activation_function()
about supported functions. Also seeEvalLayer
andCombineLayer
for similar layers.- Parameters:
activation (str) – e.g. “relu”, “tanh”, etc
opts (dict[str]|None) – for activation function, e.g. eps for safe_log
- output_before_activation: Optional[OutputWithActivation][source]#
- classmethod get_out_data_from_opts(activation, **kwargs)[source]#
- Parameters:
activation (str) –
- Return type:
Data
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.BatchNormLayer(in_dim=None, use_shift=<class 'returnn.util.basic.NotSpecified'>, use_std=<class 'returnn.util.basic.NotSpecified'>, use_sample=<class 'returnn.util.basic.NotSpecified'>, force_sample=<class 'returnn.util.basic.NotSpecified'>, momentum=<class 'returnn.util.basic.NotSpecified'>, epsilon=<class 'returnn.util.basic.NotSpecified'>, update_sample_only_in_training=<class 'returnn.util.basic.NotSpecified'>, delay_sample_update=<class 'returnn.util.basic.NotSpecified'>, param_version=<class 'returnn.util.basic.NotSpecified'>, gamma_init=<class 'returnn.util.basic.NotSpecified'>, beta_init=<class 'returnn.util.basic.NotSpecified'>, masked_time=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
Implements batch-normalization (https://arxiv.org/abs/1502.03167) as a separate layer.
Also see
NormLayer
.- Parameters:
in_dim (returnn.tensor.Dim|None) –
use_shift (bool) –
use_std (bool) –
use_sample (float) – defaults to 0.0 which is used in training
force_sample (bool) – even in eval, use the use_sample factor
momentum (float) – for the running average of sample_mean and sample_std
update_sample_only_in_training (bool) –
delay_sample_update (bool) –
param_version (int) – 0 or 1 or 2
epsilon (float) –
gamma_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the scalebeta_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the meanmasked_time (bool) – flatten and mask input tensor
The default settings for these variables are set in the function
batch_norm()
ofLayerBase
. If you do not want to change them you can leave them undefined here. With our default settings:In training: use_sample=0, i.e. not using running average, using current batch mean/var.
Not in training (e.g. eval): use_sample=1, i.e. using running average, not using current batch mean/var.
The running average includes the statistics of the current batch.
The running average is also updated when not training.
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.LayerNormLayer(in_dim=None, out_dim=None, epsilon=1e-06, **kwargs)[source]#
Applies layer-normalization.
Note that we just normalize over the feature-dim axis here. This is consistent to the default behavior of
tf.keras.layers.LayerNormalization
and also how it is commonly used in many models, including Transformer.However, there are cases where it would be common to normalize over all axes except batch-dim, or all axes except batch and time. For a more generic variant, see
NormLayer
.- Parameters:
- classmethod get_out_data_from_opts(sources, name, **kwargs)[source]#
- Parameters:
sources (list[LayerBase]) –
name (str) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.NormLayer(axis=<class 'returnn.util.basic.NotSpecified'>, axes=<class 'returnn.util.basic.NotSpecified'>, param_shape=<class 'returnn.util.basic.NotSpecified'>, scale=True, bias=True, epsilon=1e-06, **kwargs)[source]#
Normalize over specified axes, e.g. time and/or feature axis.
Note: For calculating a norm, see
MathNormLayer
instead.In case of just feature (
axes="F"
), this corresponds to layer normalization (seeLayerNormLayer
). In case of time and feature (axes="TF"
) for a 3D input, or more general all except batch (axes="except_batch"
), this corresponds to group normalization with G=1, or non-standard layer normalization. (The definition of layer-normalization is not clear on what axes should be normalized over. In many other frameworks, the default axis is just the last axis, which is usually the feature axis. However, in certain implementations and models, it is also common to normalize over all axes except batch.)The statistics are calculated just on the input. There are no running statistics (in contrast to batch normalization, see
BatchNormLayer
).For some discussion on the definition of layer-norm vs group-norm, also see here and here.
- Parameters:
axis (Dim|str|list[Dim|str]) – axis or axes over which the mean and variance are computed, e.g. “F” or “TF”
axes (Dim|str|list[Dim|str]) – axis or axes over which the mean and variance are computed, e.g. “F” or “TF”
param_shape (Dim|str|list[Dim|str]|tuple[Dim|str]) – shape of the scale and bias parameters. You can also refer to (static) axes of the input, such as the feature-dim. This is also the default, i.e. a param-shape of [F], independent of the axes to normalize over.
scale (bool) – add trainable scale parameters
bias (bool) – add trainable bias parameters
epsilon (float) – epsilon for numerical stability
- classmethod get_out_data_from_opts(sources, name, **kwargs)[source]#
- Parameters:
sources (list[LayerBase]) –
name (str) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.MathNormLayer(p, axis=<class 'returnn.util.basic.NotSpecified'>, axes=<class 'returnn.util.basic.NotSpecified'>, keep_dims=False, **kwargs)[source]#
Calculates sum(abs(x) ** p) ** (1./p).
- Parameters:
- classmethod get_out_data_from_opts(name, sources, axis=<class 'returnn.util.basic.NotSpecified'>, axes=<class 'returnn.util.basic.NotSpecified'>, keep_dims=False, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SliceLayer(axis, slice_start=None, slice_end=None, slice_step=None, out_dim=None, **kwargs)[source]#
Slicing on the input, i.e. x[start:end:step] in some axis. See also
SliceNdLayer
, for variable start. See alsoGatherLayer
, for one single position.Note that __getitem__ on a TF tensor (or also Numpy ND array) is more generic, and supports slices in multiple axes, as well as adding new dimensions, etc. It even allows to get boolean values, and then applies a boolean mask. See TF _slice_helper (== tf.Tensor.__getitem__) for a generic implementation, which calls tf.strided_slice. If we ever need such more generic support, we might consider adding a new layer, like
GenericSliceLayer
, which gets asplice_spec
, just like_slice_helper
(argument to__getitem__
). But any such a slice can already be constructed with multiple individual layers, which perform individual slices (per axis).We just support slicing in a single axis here, with optional striding (slice_step).
- Parameters:
- classmethod get_out_data_from_opts(name, axis, sources=(), slice_start=None, slice_end=None, slice_step=None, out_dim=None, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SliceNdLayer(size, start=None, min_size=None, axis='T', out_spatial_dim=None, **kwargs)[source]#
This takes out a slice-range from the time axis, e.g.
x[start:start + size]
. If the input is of shape (B,T,F) and start is of shape (B,), then the output will be of shape (B,size,F). If the input is of shape (B,T,F) and start is of shape (B,T), then the output will be of shape (B,T,size,F). This layer allows a different start slice point for each batch, in contrast toSliceLayer
, and the start is variable. See alsoGatherNdLayer
.PrefixInTimeLayer
can recover the original shape (by zero-padding).- Parameters:
start (int|LayerBase|None) – (B,…)
size (int|LayerBase|Dim|None) – We assume that this is >=0. If this might not be the case, use
min_size=0
. If None, it uses the max possible size, and it becomes a dynamic axis.min_size (int|None) – if size is None, but we want to have a min-size
axis (Dim|str) –
out_spatial_dim (Dim|None) –
- classmethod get_out_data_from_opts(name, sources=(), start=None, size=None, axis='T', out_spatial_dim=None, **kwargs)[source]#
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.GatherLayer(position: LayerBase | int, axis: Dim | str, clip_to_valid: bool = False, **kwargs)[source]#
Gathers slices on a specified axis from the input layer using indices from a
position
layer. If the input is a layer of the shape[B,D,F1]
, and position of shape[B,F2]
, this will yield output of the shape[B,F2,F1]
whereoutput[b,f2,f1] = input[b,position[b,f2],f1]
(if
D
is the axis to gather from). In general, all shared axes of the input and the positions will be considered as batch-axes.The
position
argument can also be anint
. In this case, this simply givesinput[position]
one the specifiedaxis
.It’s basically a wrapper around
tf.gather
. It provides the same functionality as the deprecatedGatherNdLayer
, but is more generic. See alsoGatherNdLayer
.- Parameters:
position – indices used to select the slices of the input from. If another layer, must be of type
int32
orint64
. Can also specify a constantint
.axis – The axis into which we gather the indices into
clip_to_valid – if True, the indices will be clipped to the valid range of the input Also taking seq lengths into account.
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.GatherNdLayer(position, **kwargs)[source]#
Warning: This layer is deprecated, use the more general
GatherLayer
instead.GatherLayer
should be equivalent, but is more general (supports multiple batch dimensions, can specify gather axis) and its name is less misleading.This takes out a position from some axis, e.g.
x[pos]
. This layers allows a different position for each batch. It’s basically a wrapper aroundtf.gather
(the name of this layer is misleading). See alsoGatherLayer
instead, which will replace this layer in the future. See alsoSliceNdLayer
. See alsoScatterNdLayer
, which is the inverse operation.- Parameters:
position (LayerBase) – indices into first axis (excluding batch) of the input
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ScatterNdLayer(position, position_axis, output_dim_via_time_from=None, out_spatial_dim=None, filter_invalid_indices=False, **kwargs)[source]#
The inverse of
GatherNdLayer
. Mostly a wrapper fortf.scatter_nd
.Note that “nd” is maybe a bit misleading. While we operate on N-D tensors, the indices (
position
) are into a single new dimension.The input to the layer are the
updates
, theindices
are via theposition
argument. The indices are into the newly constructed output dimension. The output shape is constructed via the common shape of the input, the position, and the unique common axis (if not unique, we would need to introduce an option to specify it) is replaced by the given output dimension (currently viaoutput_dim_via_time_from
).Examples:
position (indices): (B,eTs) input (updates): (eTs,D) or (B,eTs,D) -> expanded to (B,eTs,D) output shape: (B,eT,D) position (indices): (B,dT,eTs) input (updates): (eTs,D) -> expanded to (B,dT,eTs,D) output shape: (B,dT,eT,D) position (indices): (dT,eTs) input (updates): (eTs,D) -> expanded to (dT,eTs,D) output shape: (dT,eTs,D) position (indices): (dT,eTs) input (updates): (B,eTs,D) -> expanded to (dT,eTs,B,D) output shape: (dT,eT,B,D)
In all these examples, output_dim_via_time_from is (B,eT,F), and eTs gets replaced by eT.
- Parameters:
position (LayerBase) – indices into first axis (excluding batch) of the output
position_axis (Dim|str) – axis in position to replace by the output-dim
output_dim_via_time_from (LayerBase|None) – use the time-dim from this layer as the output-dim
out_spatial_dim (Dim|None) –
filter_invalid_indices (bool) – allow for indices <0 or >= output_dim, which will be discarded in the output
- classmethod get_out_data_from_opts(name, sources, position, position_axis, output_dim_via_time_from=None, out_spatial_dim=None, **kwargs)[source]#
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer ((str)->LayerBase) –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.LinearLayer(activation=None, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, use_transposed_weights=False, **kwargs)[source]#
Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function. See also
DotLayer
,ElemwiseProdLayer
,WeightedSumLayer
.- Parameters:
activation (str|None) – e.g. “relu”, or None
with_bias (bool) –
grad_filter (float|None) – if grad norm is higher than this threshold (before activation), the grad is removed
forward_weights_init (str) – see
returnn.tf.util.basic.get_initializer()
recurrent_weights_init (str) – see
returnn.tf.util.basic.get_initializer()
bias_init (str|float) – see
returnn.tf.util.basic.get_initializer()
use_transposed_weights (bool) – If True, define the weight matrix with transposed dimensions (n_out, n_in).
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SoftmaxLayer(**kwargs)[source]#
Just a LinearLayer with activation=”softmax” by default.
- Parameters:
activation (str|None) – e.g. “relu”, or None
with_bias (bool) –
grad_filter (float|None) – if grad norm is higher than this threshold (before activation), the grad is removed
forward_weights_init (str) – see
returnn.tf.util.basic.get_initializer()
recurrent_weights_init (str) – see
returnn.tf.util.basic.get_initializer()
bias_init (str|float) – see
returnn.tf.util.basic.get_initializer()
use_transposed_weights (bool) – If True, define the weight matrix with transposed dimensions (n_out, n_in).
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.LengthLayer(axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]#
Returns the length of sources as (B,), via input size_placeholder.
- Parameters:
axis (str|Dim) –
add_time_axis (bool) – should not be used
dtype (str) –
sparse (bool) –
- classmethod get_out_data_from_opts(name, sources, axis='T', add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SoftmaxOverSpatialLayer(axis=None, energy_factor=None, start=None, window_start=None, window_size=None, use_time_mask=None, log_space=False, **kwargs)[source]#
This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,T,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to
SoftmaxLayer
, this will not do a linear transformation. SeeSeqLenMaskLayer
if you just want to apply a masking.- Parameters:
axis (Dim|str|None) – which axis to do the softmax over. “T” by default
energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim).
start (LayerBase|None) – Tensor of shape (B,) indicating the start frame
window_start (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window start.
window_size (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window size.
use_time_mask (bool) – if True, assumes dyn seq len, and use it for masking. By default, if dyn seq len exists, it uses it.
log_space (bool) – if True, returns in log space (i.e. uses log_softmax)
- output_before_activation: Optional[OutputWithActivation][source]#
- classmethod get_out_data_from_opts(name, sources, axis=None, start=None, window_start=None, window_size=None, **kwargs)[source]#
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SeqLenMaskLayer(mask_value, axis='T', seq_len_source=None, start=None, window_start=None, window_size=None, **kwargs)[source]#
Masks some values away given the seq_len_source with mask_value. Also see
SoftmaxOverSpatialLayer
. Also seeSwitchLayer
, which can be used to apply a generic mask.- Parameters:
- classmethod build_mask(x, axis='T', axis_allow_int=<class 'returnn.util.basic.NotSpecified'>, seq_len_source=None, start=None, window_start=None, window_size=None)[source]#
- Parameters:
x (Data) –
axis (Dim|str|int) –
axis_allow_int (bool|NotSpecified) – Some callers of this function would pass in an int for axis directly. In that case, explicitly set this to True.
seq_len_source (Data|None) –
start (Data|None) –
window_start (Data|None) –
window_size (Data|int|None) –
- Returns:
mask which is broadcastable to energy_data, thus you can e.g. use
returnn.tf.util.basic.where_bc()
- Return type:
tf.Tensor
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(name, sources, start=None, window_start=None, window_size=None, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.BooleanMaskLayer(*, mask: LayerBase, dims: Sequence[Dim], out_dim: Dim | None = None, **kwargs)[source]#
Wrapper around tf.boolean_mask.
- Parameters:
mask –
dims –
out_dim –
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(*, name: str, sources: Sequence[LayerBase], mask: LayerBase, out_dim: Dim | None = None, **kwargs) Tensor [source]#
- Parameters:
name –
sources –
mask –
out_dim –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RandomStateInitLayer(algorithm=None, seed=None, out_dim=None, **kwargs)[source]#
This calculates the initial state value for the state var of
RandomLayer
. This depends on the algorithm and seed.- Parameters:
algorithm (str|tf.random.Algorithm|None) – “philox”, “three-fry”, “auto-select”. by default “philox”. See
tf.random.stateless_uniform()
for some documentation. “auto-select” will automatically select the optimal algorithm based on the device, so it might select a different algorithm depending on the device. Note that the state shape is dependent on the device, so if you want that checkpoints are compatible across devices, do not use “auto-select”. We take the default fromtf.random.Generator
.seed (int|Sequence[int]|numpy.ndarray|None) – if given, the state will deterministically depend on this (and the algorithm) and nothing else. If you have multiple random generators (state vars), make sure that you have different seeds for each! If None (default), the seed will be deterministically taken from the network random generator at construction time, which is usually a good idea. You still can change the global network seed.
out_dim (Dim|None) – new dim tag for random state dim
- classmethod select_algorithm(algorithm)[source]#
- Parameters:
algorithm (str|int|tf.random.Algorithm|None) –
- Return type:
int
- classmethod get_out_data_from_opts(name, algorithm=None, out_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
algorithm (str|None) –
out_dim (Dim|None) –
- Return type:
Data
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RandomLayer(shape, distribution, mean=None, stddev=None, bound=None, minval=None, maxval=None, dtype='float32', sparse_dim=None, feature_dim=None, seed=None, algorithm=None, explicit_state=None, auto_update_state=None, static=None, shape_deps=(), stop_grad: bool = False, **kwargs)[source]#
Generates random numbers from uniform or normal or truncated normal distribution.
This uses the TensorFlow stateless random ops internally, i.e. all the state handling is explicit. The state var can be explicitly provided and initialized via
RandomStateInitLayer
, or when not provided it will be automatically created.There are two possible distinct use cases:
For any randomness in the model, e.g. dropout. So each
session.run
step will produce a new random number and advance the random state.To initialize parameters via the config, using
VariableLayer
with theinit_by_layer
option. This will only be called once when initializing the parameters. For this use case, we do not want to keep a random state var. You can just passstatic=False
. Alternatively you could also pass the output of aRandomStateInitLayer
asstate
.
- Parameters:
shape (Sequence[Dim|int]) –
distribution (str) – “uniform”, “normal” or “truncated_normal”
mean (int|float|LayerBase|None) –
stddev (int|float|LayerBase|None) –
bound (int|float|LayerBase|None) – for uniform, defining the range [-bound, bound)
minval (int|float|LayerBase|None) – for uniform
maxval (int|float|LayerBase|None) – for uniform
dtype (str) –
sparse_dim (Dim|None) –
feature_dim (Dim|None) –
seed (int|list[int]|numpy.ndarray|None) – If not given, uses self.network.random.randint, i.e. then it is controlled by the global seed setting, and every layer would get its own seed. If you specify it explicitly, make sure every
RandomLayer
uses a different seed, otherwise you would get the same random numbers everywhere.algorithm (str|tf.random.Algorithm|None) – see
RandomStateInitLayer
explicit_state (LayerBase|None) – You can pass the state explicitly here. If not given, will be created automatically, and updated automatically. You could pass a
VariableLayer
with initial value viaRandomStateInitLayer
, or directly aRandomStateInitLayer
. If auto_update_state is True, it must be a variable, and every time a new random number is created, this variable is updated. Otherwise (default) it will not be updated automatically.auto_update_state (bool|None) – only used when you pass an explicit state
static (bool|None) – if no state at all should be used. it just relies on the seed then.
shape_deps (list[LayerBase]) – for dyn dim tags in shape
stop_grad (bool) – if True, will stop the gradient to mean,stddev,bound,minval,maxval
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(name, shape, dtype='float32', sparse_dim=None, feature_dim=None, shape_deps=(), **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RandIntLayer(shape, maxval, minval=0, dtype='int32', sparse_dim=None, seed=None, **kwargs)[source]#
Generates random integer numbers using
tf.random.uniform
. It is recommended to useRandomLayer
instead.- Parameters:
shape (tuple[Dim|int]|list[Dim|int]) – desired shape of output tensor
maxval (int|LayerBase) – upper bound (exclusive) on range of random values
minval (int|LayerBase) – lower bound (inclusive) on range of random values
dtype (str) – type of the output. For random ints, int32 and int64 make sense, but could also be floats
sparse_dim (Dim|None) –
seed (int|None) – random seed
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer ((str)->LayerBase) –
- classmethod get_out_data_from_opts(name, network, shape, maxval, minval=0, dtype='int32', sparse_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
network (returnn.tf.network.TFNetwork) –
shape (tuple[Dim|int]|list[Dim|int]) – desired shape of output tensor
maxval (int|LayerBase) – upper bound (exclusive) on range of random values
minval (int|LayerBase) – lower bound (inclusive) on range of random values
dtype (str) – type of the output. For random ints, int32 and int64 make sense, but could also be floats
sparse_dim (Dim|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RangeLayer(limit, start=0, delta=1, dtype=None, sparse=False, out_spatial_dim=None, **kwargs)[source]#
Generic wrapper around
tf.range
. See alsoRangeInAxisLayer
.- Parameters:
limit (int|float) –
start (int|float) –
delta (int|float) –
dtype (str|None) –
sparse (bool) –
out_spatial_dim (Dim|None) –
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer ((str)->LayerBase) –
- classmethod get_out_data_from_opts(name, limit, start=0, delta=1, dtype=None, sparse=False, out_spatial_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
limit (int|float) –
start (int|float) –
delta (int|float) –
dtype (str|None) –
sparse (bool) –
out_spatial_dim (Dim|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RangeInAxisLayer(axis, dtype='int32', unbroadcast=False, keepdims=False, sparse=False, **kwargs)[source]#
Assume that the input is e.g. (B,T,D), and you specify axis=”T”, you will get (T,), where the specified axis is filled with
tf.range
. See alsoRangeLayer
.- Parameters:
axis (str|Dim) –
dtype (str) –
unbroadcast (bool) – DEPRECATED, unsupported, and not needed
keepdims (bool) – DEPRECATED, unsupported, and not needed
sparse (bool) –
- classmethod get_out_data_from_opts(name, sources, axis, dtype='int32', sparse=False, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RangeFromLengthLayer(dtype='int32', sparse=False, out_spatial_dim=None, **kwargs)[source]#
Given some dynamic sequence lengths as input, this creates a tf.range over the implied dimension. As a side effect, this can create a new dyn dim tag for the given sequence lengths. This side effect can be the main functionality in certain use cases. See also
RangeInAxisLayer
.Consider the example:
y: {class: range_in_axis, from: x, axis: T}
This is basically equivalent to:
x_len: {class: length, from: x} y: {class: range_from_length, from: x_len}
- Parameters:
axis (str) –
dtype (str) –
sparse (bool) –
out_spatial_dim (Dim|None) –
- classmethod get_out_data_from_opts(name, sources, dtype='int32', sparse=False, out_spatial_dim=None, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.BatchSoftmaxLayer(**kwargs)[source]#
Softmax over spacial and feature axis
- Parameters:
in_dim (Dim|None) –
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) –
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see
Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons
- classmethod get_out_data_from_opts(name, sources, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ConstantLayer(sources, value=0.0, shape=None, dtype=None, with_batch_dim=False, sparse_dim=None, feature_dim=None, shape_deps=(), **kwargs)[source]#
Output is a constant value.
- Parameters:
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- classmethod get_out_data_from_opts(name, value=0.0, shape=None, dtype=None, with_batch_dim=False, sparse_dim=None, feature_dim=<class 'returnn.util.basic.NotSpecified'>, shape_deps=(), **kwargs)[source]#
- Parameters:
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.GatingLayer(activation, gate_activation='sigmoid', out_dim=None, **kwargs)[source]#
Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.
- Parameters:
activation (str) –
gate_activation (str) –
out_dim (Dim|None) –
- classmethod get_out_data_from_opts(name, sources, n_out=<class 'returnn.util.basic.NotSpecified'>, out_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
n_out (int|None|NotSpecified) –
out_dim (Dim|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.WindowLayer(window_size=None, window_dim=None, window_left=None, window_right=None, axis='T', out_spatial_dim=None, padding='same', stride=1, _use_opt_dim_order=None, **kwargs)[source]#
Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. In PyTorch, this is called
unfold
. We sometimes call this “chunking”. There is also the similarTimeChunkingLayer
.E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}.
Use stride==window_size and window_right=window_size - 1 in combination with a MergeDimsLayer to achieve feature stacking with right-hand zero padding.
This is not to take out a single window from the time-dimension. See
SliceLayer
orSliceNdLayer
.The inverse layer is
FoldLayer
.- Parameters:
- classmethod get_out_data_from_opts(name, network, sources, window_size=None, window_dim=None, axis='T', out_spatial_dim=None, padding='same', stride=1, _use_opt_dim_order=None, **kwargs)[source]#
- Parameters:
name (str) –
network (returnn.tf.network.TFNetwork) –
sources (list[LayerBase]) –
window_size (int|None) –
window_dim (Dim|None) –
axis (Dim|str) –
out_spatial_dim (Dim|None) –
padding (str) –
stride (int) –
_use_opt_dim_order (bool|None) –
- Return type:
Data
- classmethod get_rec_initial_extra_outputs(network, batch_dim, rec_layer, window_size=None, window_dim=None, axis='T', sources=(), **kwargs)[source]#
- Parameters:
network (returnn.tf.network.TFNetwork) –
batch_dim (tf.Tensor) –
rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase) –
window_size (int|None) –
window_dim (Dim|None) –
axis (Dim|str) –
sources (list[LayerBase]) –
- Return type:
dict[str,tf.Tensor]
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.FoldLayer(mode: str, in_spatial_dim: Dim | str, window_dim: Dim | str, out_spatial_dim: Dim | None = None, padding: str = 'same', window_left: int | None = None, window_right: int | None = None, stride: int = 1, **kwargs)[source]#
The inverse of
WindowLayer
. We sometimes call this “unchunking”. TheTimeUnChunkingLayer
is similar.Input (in_spatial_dim, window_dim, other_dims…) -> output (out_spatial_dim, other_dims…).
The window_dim is folded into the out_spatial_dim. This is also similar as the PyTorch fold operation (with mode=”sum”).
- Parameters:
mode – “sum” or “mean” (average), for overlapping frames
in_spatial_dim –
window_dim –
out_spatial_dim –
padding –
window_left –
window_right –
stride –
- classmethod get_out_data_from_opts(name: str, sources: List[LayerBase], in_spatial_dim: Dim | str, window_dim: Dim | str, out_spatial_dim: Dim | None = None, padding: str = 'same', window_left: int | None = None, window_right: int | None = None, stride: int = 1, **kwargs) Tensor [source]#
out data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.CumsumLayer(axis='T', additional_left_summand_per_element=None, reverse=False, **kwargs)[source]#
Basically wraps tf.cumsum. Also supports that in the RecLayer.
- Parameters:
axis (str) – see
Data.get_axis_from_description()
additional_left_summand_per_element (str|int|float|None) – the order matters for tf.string
reverse (bool) –
- classmethod get_out_data_from_opts(name, sources, axis='T', **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
axis (str) –
- Return type:
Data
- classmethod get_rec_initial_extra_outputs(network, batch_dim, rec_layer, axis='T', sources=(), **kwargs)[source]#
- Parameters:
network (returnn.tf.network.TFNetwork) –
batch_dim (tf.Tensor) –
rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase) –
axis (str) –
sources (list[LayerBase]) –
- Return type:
dict[str,tf.Tensor]
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.PadLayer(axes, padding, out_dims=None, value=0, mode='constant', **kwargs)[source]#
Adds (e.g. zero) padding in some axis or axes. Also see
PrefixInTimeLayer
for dynamic dims.- Parameters:
axes (Dim|str|list[Dim|str]) – e.g. “F” etc. see
Data.get_axes_from_description()
.padding (list[(int,int)]|(int,int)|int) – how much to pad left/right in each axis
value (int|float) – what constant value to pad, with mode==”constant”
mode (str) – “constant”, “reflect”, “symmetric” and “replication”
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.MergeDimsLayer(axes, keep_order=<class 'returnn.util.basic.NotSpecified'>, n_out=None, out_dim=None, **kwargs)[source]#
Merges a list of axes into a single one. (Flatten the dims.) E.g. input is (batch, width, height, dim) and axes=(1,2), then we get (batch, width*height, dim). Or input is (batch, time, height, dim) and axes=”except_time”, then we get (batch, time, height*dim). See also
CombineDimsLayer
. When batch and time got merged,SplitBatchTimeLayer
can undo this. When you want to merge batch and time, but remove the padding efficiently, i.e. flatten it, seeFlattenBatchLayer
.- Parameters:
axes (Sequence[Dim|str]) – see
Data.get_axis_from_description()
keep_order (bool|NotSpecified) – The old default was: the axes are sorted, and then merged. Thus, the order of incoming axes will influence the result. E.g. inputs [B,S,F] and [B,F,S], with
axes=["S","F"]
, will get different results, although the output shape is [B,S*F] in both cases. This is bad: In general, other layers in RETURNN might reorder the axes for various reasons, and all layers should behave in the same way, no matter the order. It is recommended to setkeep_order=True
, such that the order defined inaxes
defines the behavior, and not the incoming axis order. Since behavior version 6, this is already the case.n_out (int|None) –
out_dim (Dim|None) –
- classmethod get_out_data_from_opts(name, axes, keep_order=<class 'returnn.util.basic.NotSpecified'>, sources=(), n_out=<class 'returnn.util.basic.NotSpecified'>, out_type=None, out_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
axes (Sequence[Dim|str]) –
keep_order (bool|NotSpecified) –
sources (list[LayerBase]) –
n_out (int|None|NotSpecified) –
out_type (None|dict[str]) –
out_dim (Dim|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SplitLayer(axis=None, num_splits=None, size_splits=None, out_dims=None, **kwargs)[source]#
Splits one axis into multiple parts, via tf.split. self.output is simply the input copied. Each part can be accessed via the sublayers “/%i”.
- Parameters:
axis (str|None) – feature axis by default
num_splits (int|None) –
size_splits (list[int]|None) –
out_dims (list[Dim]|None) –
- classmethod get_available_sub_layer_names(parent_layer_kwargs)[source]#
- Parameters:
parent_layer_kwargs (dict[str]) –
- Return type:
list[str]
- classmethod get_out_data_from_opts(sources, **kwargs)[source]#
- Parameters:
sources (list[LayerBase]) –
- Return type:
Data
- classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]#
- Parameters:
layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)
parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())
- Returns:
Data template, class type of sub-layer, layer opts (transformed)
- Return type:
(Data, type, dict[str])|None
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SplitDimsLayer(axis, dims, pad_to_multiples=None, pad_value=0, **kwargs)[source]#
Splits one axis into multiple axes. E.g. if you know that your feature-dim is composed by a window, i.e. the input is (batch, time, window * feature), you can set axis=”F”, dims=(window, -1), and you will get the output (batch, time, window, feature).
If the split axis has a dynamic length, exactly one of the axes that we split into need to also have a dynamic length. You can e.g. use this to split the input dimension into smaller “chunks” of a fixed window size. E.g. you could have input (batch, time, feature) and set axis=”T”, dims=(-1, window), to get output (batch, split_time, window, feature). In this case, the exact sequence lengths are lost and everything is padded to multiples of the window size using the given padding value. Use
ReinterpretDataLayer
to receive back the original sequence lengths after merging.Also see
SplitBatchTimeLayer
. Also seeMergeDimsLayer
which can undo this operation.- Parameters:
axis (Dim|str) – e.g. “F”
dims (tuple[Dim|int]|list[Dim|int]) – what the axis should be split into. e.g. (window, -1)
pad_to_multiples (bool|None) – If true, input will be padded to the next multiple of the product of the static dims, such that splitting is actually possible. By default this is done iff the axis has a dynamic size
pad_value (int|float) – What pad value to use for pad_to_multiples
- classmethod get_out_data_from_opts(name, axis, dims, pad_to_multiples=None, sources=(), **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SplitBatchTimeLayer(base, **kwargs)[source]#
A very specific layer which expects to get input of shape (batch * time, …) and converts it into (batch, time, …), where it recovers the seq-lens from some other layer. See
SplitDimsLayer
for a more generic layer.- Parameters:
base (LayerBase) – used to recover the seq-lens
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ReshapeLayer(in_dims, out_dims, extra_deps=(), **kwargs)[source]#
Allows to reshape (…, in_dims, …) to (…, out_dims, …) as long as prod(in_dims) == prod(out_dims).
in_dims don’t need to be directly behind each other or in that order – internally it will permute it such that it is in the right order. out_dims should be defined.
This can be used for clever indexing, slicing, padding tricks. It can also be used as an alternative to
SplitDimsLayer
orMergeDimsLayer
.- Parameters:
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork) –
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.FlattenBatchLayer(axis='T', batch_major=True, **kwargs)[source]#
Merges one axis into the batch axis. If the axis has dynamic lengths, this would use flattening, i.e. recalculate the padding, i.e. the size changes. This basically wraps
flatten_with_seq_len_mask()
orflatten_with_seq_len_mask_time_major()
. See alsoMergeDimsLayer
, which does not do flattening, i.e. the size stays the same.- Parameters:
axis (str) –
batch_major (bool) – if False, will flatten in time-major manner
- classmethod get_out_data_from_opts(sources, name, axis='T', batch_major=True, **kwargs)[source]#
- Parameters:
sources (list[LayerBase]) –
name (str) –
axis (str) –
batch_major (bool) – if False, will flatten in time-major manner
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.UnflattenBatchLayer(**kwargs)[source]#
Inverse of
FlattenBatchLayer
, so recovers an axis previously merged into the batch axisThis basically wraps
unflatten_with_seq_len_mask()
.- Parameters:
in_dim (Dim|None) –
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) –
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see
Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons
- classmethod get_out_data_from_opts(sources, name, **kwargs)[source]#
- Parameters:
sources (list[LayerBase]) –
name (str) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.UnflattenNdLayer(sizes, num_axes, in_dim='T', out_dims=None, declare_same_sizes_as=None, **kwargs)[source]#
This keeps the batch axis as-is, i.e. the flattening/unflattening did not happen on the batch axis.
Example:
Assumes that the input is of shape (B,T,<Ds>) which represents flattened images, where each image is of size width * height. We additionally provide these image sizes (shape (B,2)), i.e. (width,height) tuples. We return the unflattened images of shape (B,W,H,<Ds>), where W/H are the max width/height.
This basically wraps
returnn.tf.util.basic.unflatten_nd()
.- Parameters:
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(name, sources, num_axes, in_dim='T', out_dims=None, declare_same_sizes_as=None, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ExpandDimsLayer(axis, dim=1, **kwargs)[source]#
Adds some axis.
- Parameters:
axis (str|int) – axis to add, e.g. “F”|”feature” or “spatial”|”time”|”T”. if this is an integer, the input data is first converted into batch-major mode, and then this is counted with batch-dim.
dim (int|Dim) – dimension of new axis (1 by default)
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.RepeatLayer(repetitions, axis='T', out_dim=None, **kwargs)[source]#
A wrapper around tf.repeat, but supports an additional batch axis for the durations The sum of the repetitions has to be non-zero for each sequence in the batch.
This layer can only be used with Tensorflow 1.15.0 or newer.
- Parameters:
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(name, sources, axis, repetitions, out_dim=None, **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.TileLayer(multiples, out_dims=None, **kwargs)[source]#
A wrapper around tf.tile
- Parameters:
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.CastLayer(dtype, output, **kwargs)[source]#
Cast to some other dtype.
- Parameters:
dtype (str) –
output (Data) –
- classmethod get_out_data_from_opts(dtype, **kwargs)[source]#
- Parameters:
dtype (str) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SwapAxesLayer(axis1, axis2, **kwargs)[source]#
Swaps two axes. Basically a wrapper around
returnn.tf.util.basic.swapaxes()
. Note that usually, this should not be needed, and it is recommended not to be used, as this will be unnecessarily inefficient. Normally, all RETURNN layers will automatically transpose the input data into whatever format they need.All axes always have a special meaning (e.g. feature dim or time dim) or dimension tag (e.g. for time axes, including dyn seq lengths). If you need to change the meaning (and not actually transpose / swap axes), you need to use
ReinterpretDataLayer
.See also
TransposeLayer
for a more generic variant.See also
ReinterpretDataLayer
, which does not swap/transpose axes, but allows to reinterpret their meaning / dim tags.- Parameters:
axis1 (int|str) –
axis2 (int|str) –
- classmethod get_out_data_from_opts(name, sources, axis1, axis2, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
axis1 (int|str) –
axis2 (int|str) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.TransposeLayer(perm: Dict[Dim | str | int, Dim | str] | Sequence[Dim], **kwargs)[source]#
Basically a wrapper around
tf.transpose()
.Note that usually, this should not be needed, and it is recommended not to be used, as this will be unnecessarily inefficient. Normally, all RETURNN layers will automatically transpose the input data into whatever format they need.
All axes always have a special meaning (e.g. feature dim or time dim) or dimension tag (e.g. for time axes, including dyn seq lengths). If you need to change the meaning (and not actually transpose / swap axes), you need to use
ReinterpretDataLayer
.See also
ReinterpretDataLayer
, which does not transpose axes, but allows to reinterpret their meaning / dim tags.One valid use case is to use this for the final output layer, to make sure the output is in the correct format.
- Parameters:
perm – target axis -> source axis
- classmethod transpose(input_data: Tensor, perm: Dict[Dim | str | int, Dim | str] | Sequence[Dim], name: str | None = None) Tensor [source]#
- Parameters:
input_data –
perm –
name –
- Returns:
transposed data
- classmethod get_perm_int(input_data: Tensor, perm: Dict[Dim | str | int, Dim | str] | Sequence[Dim]) List[int] [source]#
- Parameters:
input_data –
perm –
- classmethod get_out_data_from_opts(name, sources, perm, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
perm (dict[str,str]) – target axis -> source axis
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ReinterpretDataLayer(switch_axes=None, size_base=None, batch_dim_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]#
Acts like the
CopyLayer
but reinterprets the role of some axes or data.- Parameters:
switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes
size_base (LayerBase|None) – copy the size_placeholder from the given layer
batch_dim_base (LayerBase|None) – copy the batch dim from this layer
set_axes (dict[str,Dim|str|None]) – This can be used to overwrite the special axes like time_dim_axis or feature_dim_axis. For that, use keys “B”,”T” or “F”, and a value via
Data.get_axis_from_description()
.set_dim_tags (dict[str|Dim,Dim]|Sequence[Tuple[Dim,Dim]]|None) – axis -> new dim tag. assigns new dim tags. If the passed dim tag is yet undefined, this will not use same_dim_tags_as (declare_same_as) but create a new dim tag. This option is useful for generalized self attention (https://github.com/rwth-i6/returnn/issues/391).
enforce_batch_major (bool) –
enforce_time_major (bool) –
set_sparse (bool|None) – if bool, set sparse value to this
set_sparse_dim (Dim|int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse
increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse
- output_before_activation: Optional[OutputWithActivation][source]#
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- classmethod get_out_data_from_opts(name, sources, switch_axes=None, size_base=None, batch_dim_base=None, set_axes=None, set_dim_tags=None, enforce_batch_major=False, enforce_time_major=False, set_sparse=None, set_sparse_dim=<class 'returnn.util.basic.NotSpecified'>, increase_sparse_dim=None, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes
size_base (LayerBase|None) – similar as size_target
batch_dim_base (LayerBase|None) –
set_axes (dict[str,Dim|str|None]) –
set_dim_tags (dict[str|Dim,Dim]|Sequence[Tuple[Dim,Dim]]|None) –
enforce_batch_major (bool) –
enforce_time_major (bool) –
set_sparse (bool|None) – if bool, set sparse value to this
set_sparse_dim (Dim|int|None|NotSpecified) – set sparse dim to this. assumes that it is sparse
increase_sparse_dim (int|None) – add this to the dim. assumes that it is sparse
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ConvLayer(filter_size, padding, strides=1, dilation_rate=1, groups=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, in_dim=None, in_spatial_dims=None, n_out=None, out_dim=None, out_spatial_dims=None, auto_use_channel_first=<class 'returnn.util.basic.NotSpecified'>, with_bias=<class 'returnn.util.basic.NotSpecified'>, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, filter=None, filter_perm=None, bias=None, use_time_mask=False, pad_seq_len_to_power=None, **kwargs)[source]#
A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.
- Parameters:
filter_size (Sequence[Dim]|Sequence[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. The input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. It will automatically swap the batch-dim to the first axis of the input data.
padding (str) – “same”, “valid” or “same_static”. “same_static” is calculated differently depending on whether an axis is static or dynamic. For static axes, “same_static” padding is the same as “same” padding, i.e. filter_size - 1 - (T + strides - 1) % strides. For dynamic axes, “same_static” calculates the total padding size as filter_size - 1, i.e. it is independent of the length T of the axis and the striding. For dynamic axes, to avoid skipping any frames on the right, we set left_padding = (filter_size - strides) // 2.
strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
dilation_rate (int|Sequence[int]) – dilation for the spatial dims
groups (int) – grouped convolution
in_dim (Dim|None) –
in_spatial_dims (Sequence[Dim|str]|None) –
n_out (int|None) – number of outgoing features
out_dim (Dim|None) –
out_spatial_dims (Sequence[Dim]|None) –
input_expand_dims (int) – number of spatial dims to add to the input
input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.
input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.
auto_use_channel_first (bool|NotSpecified) – convert the input to NCHW or not
with_bias (bool|NotSpecified) – if True, will add a bias to the output features. True by default since behavior version 10.
activation (None|str) – if set, will apply this function at the end
filter (LayerBase|None) – if given, will not create an own parameter, but use this as the filter
filter_perm (dict[str,str]|None) – transposes the filter (input filter as layer)
bias (LayerBase|None) – if given, will not create an own parameter, but use this as the bias
use_time_mask (bool) –
pad_seq_len_to_power (Optional[float]) – pad sequence length to power of given number to reduce number of different sequence lengths. See https://github.com/rwth-i6/returnn/issues/1450 and https://github.com/tensorflow/tensorflow/issues/62441.
- output_before_activation: Optional[OutputWithActivation][source]#
- classmethod set_output_dim_tags(output, num_batch_dims, in_spatial_dims, out_spatial_dims, filter_size, strides, dilation_rate, padding)[source]#
- classmethod transform_input(input_data, network, in_dim=None, in_spatial_dims=None, input_expand_dims=0, input_split_feature_dim=None, input_add_feature_dim=False, use_time_mask=False)[source]#
- Parameters:
input_data (Data) –
network (returnn.tf.network.TFNetwork) –
in_dim (Dim|None) –
in_spatial_dims (list[Dim|str]|None) –
input_expand_dims (int) – number of spatial dims to add to the input
input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.
input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.
use_time_mask (bool) –
- Returns:
(transformed input, num batch dims). all batch dims are at the front
- Return type:
(Data, int)
- classmethod get_input_placeholder_with_same_static_padding(input_data: Tensor, num_batch_dims: int, filter_size: Sequence[int], strides: Sequence[int], out_batch_feature_major: bool) Tensor [source]#
Returns the placeholder of input_data with same_static padding applied to it.
- Parameters:
input_data –
num_batch_dims –
filter_size –
strides –
out_batch_feature_major –
- classmethod get_out_data_from_opts(name, sources, network, filter_size, padding, strides=1, dilation_rate=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, in_dim=None, in_spatial_dims=None, n_out=None, out_dim=None, out_spatial_dims=None, auto_use_channel_first=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
- Parameters:
name (str) –
sources (Sequence[LayerBase]) –
network (returnn.tf.network.TFNetwork) –
filter_size (Sequence[int|Dim]) –
padding (str) –
strides (int|Sequence[int]) –
dilation_rate (int|Sequence[int]) –
input_expand_dims (int) – number of dynamic dims to add to the input
input_add_feature_dim (bool) –
input_split_feature_dim (None|int) –
in_dim (Dim|None) –
in_spatial_dims (Sequence[Dim|str]|None) –
n_out (int|None) – number of outgoing features
out_dim (Dim|None) –
out_spatial_dims (Sequence[Dim]|None) –
input_expand_dims – number of spatial dims to add to the input
auto_use_channel_first (bool|NotSpecified) –
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, in_dim=None, in_spatial_dims=None, out_dim=None, out_spatial_dims=None, use_channel_first=<class 'returnn.util.basic.NotSpecified'>, use_time_mask=False, **kwargs)[source]#
A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.
- Parameters:
mode (str) – “max” or “avg”
pool_size (tuple[int]) – shape of the window of each reduce
padding (str) – “same”, “valid” or “same_static”. “same_static” is calculated differently depending on whether an axis is static or dynamic. For static axes, “same_static” padding is the same as “same” padding, i.e. filter_size - 1 - (T + strides - 1) % strides. For dynamic axes, “same_static” calculates the total padding size as filter_size - 1, i.e. it is independent of the length T of the axis and the striding. For dynamic axes, to avoid skipping any frames on the right, we set left_padding = (filter_size - strides) // 2.
dilation_rate (tuple[int]|int) –
strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
in_dim (Dim|None) –
in_spatial_dims (list[Dim|str]|None) –
out_dim (Dim|None) –
out_spatial_dims (list[Dim]|None) –
use_channel_first (bool|NotSpecified) – if set, will transform input to NCHW format
use_time_mask (bool) –
- classmethod get_out_data_from_opts(name, sources, network, pool_size, strides=None, dilation_rate=1, padding='VALID', in_dim=None, in_spatial_dims=None, out_dim=None, out_spatial_dims=None, use_channel_first=<class 'returnn.util.basic.NotSpecified'>, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
network (returnn.tf.network.TFNetwork) –
pool_size (tuple[int]|list[int]) –
strides (tuple[int]|list[int]|int) –
dilation_rate (int|tuple[int]|list[int]) –
padding (str) –
in_dim (Dim|None) –
in_spatial_dims (list[Dim|str]|None) –
out_dim (Dim|None) –
out_spatial_dims (list[Dim]|None) –
use_channel_first (bool|NotSpecified) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.DctLayer(type=2, n=None, norm=None, **kwargs)[source]#
Layer to perform DCT Wraps
tf.signal.dct()
. For further documentation on the input arguments, refer to https://www.tensorflow.org/api_docs/python/tf/signal/dct- Parameters:
type (int) – DCT type to perform. Must be 1, 2, 3, or 4
n (int|None) – length of the transform
norm (str|None) – normalization to apply. Must be None or “ortho”
- classmethod get_out_data_from_opts(name, sources, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.TransposedConvLayer(filter_size, strides=None, padding='same', remove_padding=0, output_padding=None, in_dim=None, in_spatial_dims=None, out_dim=None, out_spatial_dims=None, with_bias=True, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, filter=None, filter_perm=None, bias=None, use_time_mask=False, **kwargs)[source]#
Transposed convolution, sometimes also called deconvolution. See
tf.nn.conv2d_transpose()
(currently we support 1D/2D).- Parameters:
filter_size (list[int]) –
strides (list[int]|None) – specifies the upscaling. by default, same as filter_size
padding (str) – “same” or “valid”
remove_padding (list[int]|int) –
output_padding (list[int|None]|int|None) –
in_dim (Dim|None) –
in_spatial_dims (list[Dim|str]|None) –
out_dim (Dim|None) –
out_spatial_dims (list[Dim]|None) –
with_bias (bool) – whether to add a bias. enabled by default.
activation (str|None) –
forward_weights_init –
bias_init –
filter (LayerBase|None) – if given, will not create an own parameter, but use this as the filter
filter_perm (dict[str,str]|None) – transposes the filter (input filter as layer)
bias (LayerBase|None) – if given, will not create an own parameter, but use this as the bias
use_time_mask (bool) –
- output_before_activation: Optional[OutputWithActivation][source]#
- static deconv_output_length(input_length, filter_size, padding, output_padding=None, stride=0, dilation=1, out_dim=None)[source]#
Determines output length of a transposed convolution given input length. Copied from conv_utils.deconv_output_length, adapted with simplification.
Also see
ConvLayer.calc_out_dim()
.- Parameters:
- Returns:
The output length (integer)
- Return type:
T
- classmethod get_out_data_from_opts(name, sources, network, filter_size, strides=None, padding='same', remove_padding=0, output_padding=None, n_out=None, out_dim=None, out_spatial_dims=None, in_dim=None, in_spatial_dims=None, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
network (returnn.tf.network.TFNetwork) –
filter_size (list[int]) –
strides (list[int]|None) –
padding (str) –
remove_padding (list[int]|int) –
output_padding (list[int|None]|int|None) –
n_out (int|None) – number of outgoing features
out_dim (Dim|None) –
out_spatial_dims (list[Dim]|None) –
in_dim (Dim|None) –
in_spatial_dims (list[Dim|str]|None) –
- Return type:
Data
- classmethod transform_config_dict(d, network, get_layer)[source]#
- Parameters:
d (dict[str]) –
network (returnn.tf.network.TFNetwork) –
get_layer –
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None, **kwargs)[source]#
This reduces some axis by using e.g. “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.
- Parameters:
mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”
axes (Sequence[Dim|str]) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See
Data.get_axes_from_description()
.axis (Dim|str) – for compatibility, can be used instead of
axes
keep_dims (bool) – if dimensions should be kept (will be 1)
enforce_batch_dim_axis (int|None) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.
use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.
- classmethod reduce(input_data, mode, axes=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None)[source]#
- Parameters:
input_data (Data) –
mode (str) – “sum” or “max”, “argmin”, “min”, “argmax”, “mean”, “logsumexp”
axes (int|list[int]|str) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See
Data.get_axes_from_description()
.keep_dims (bool) – if dimensions should be kept (will be 1)
enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.
use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.
- Return type:
tf.Tensor
- classmethod need_enforce_batch_dim_axis(axes)[source]#
- Parameters:
axes (int|list[int]|str|Dim) –
- Returns:
if any integer is in axes, thus we should have a fixed dimension layout
- Return type:
bool
- classmethod get_axes(axis, input_data)[source]#
- Parameters:
axis – see self.__init__()
input_data (Data) –
- Returns:
list of axes
- Return type:
list[int]
- classmethod get_out_data_from_opts(name, sources, mode='', axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]#
- Parameters:
name (str) –
sources (list[LayerBase]) –
mode (str) – (default here “” because other code uses this function)
axes (str|list[str]|None) –
axis (str|None) –
keep_dims (bool) –
enforce_batch_dim_axis (int|None) –
- Return type:
Data
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.ReduceOutLayer(mode, num_pieces, out_dim=None, **kwargs)[source]#
Combination of
SplitDimsLayer
applied to the feature dim andReduceLayer
applied to the resulting feature dim. This can e.g. be used to do maxout.- Parameters:
mode (str) – “sum” or “max” or “mean”
num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.
out_dim (Dim|None) –
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.SqueezeLayer(axis, enforce_batch_dim_axis=None, allow_no_op=False, **kwargs)[source]#
Removes an axis with dimension 1. This is basically a wrapper around tf.squeeze.
- Parameters:
axis (Dim|int|list[int]|str) – one axis or multiple axis to squeeze. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
enforce_batch_dim_axis (int|None) –
allow_no_op (bool) –
- classmethod get_out_data_from_opts(axis, enforce_batch_dim_axis=None, allow_no_op=False, sources=(), **kwargs)[source]#
- output_before_activation: Optional[OutputWithActivation][source]#
- search_choices: Optional[SearchChoices][source]#
- class returnn.tf.layers.basic.StackLayer(axis=None, out_spatial_dim=None, **kwargs)[source]#
Stacks multiple inputs together using
tf.stack()
. This creates a new dimension for the stack.For concatenation (in feature dimension), see
CopyLayer
.- Parameters:
axis (int|None) – new axis. If not given, will use Data.get_default_new_axis_for_dim_tag(<spatial>), i.e. some reasonable default for a new spatial axis.
out_spatial_dim (Dim|None) –