`returnn.tf.layers.segmental_model`¶

returnn.tf.layers.segmental_model.batch_sizes_after_windowing(sizes, window)[source]¶

Parameters:

sizes (tf.Tensor) – (batch_sizes)
window (int) – size of the applied window

Returns:

sizes for each batch after applying a window on each batch

Return type:

tf.Tensor

returnn.tf.layers.segmental_model.batch_indices_after_windowing(sizes, window)[source]¶: here we compute the start and end times for each of the new batches when applying a window :param tf.Tensor sizes: (batch_sizes) :param int window: size of the applied window :return: tensor of shape (?, 3), contains batch index, start-frame and end-frame for each batch after applying a window :rtype: tf.Tensor

class returnn.tf.layers.segmental_model.SegmentInputLayer(window=15, **kwargs)[source]¶

This layer takes the input data, applies a window and outputs each window as a new batch, this is more efficient than a window as a new dimension if sequences have varying lengths

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'segment_input'[source]¶

classmethod get_out_data_from_opts(name, sources, window, **kwargs)[source]¶

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.ClassesToSegmentsLayer(num_classes, window=15, **kwargs)[source]¶

This layer takes a sequence of classes (=> sparse input) and applies a window (same as SegmentInput) to it. For each position t in the window it computes the relative frequencies of the classes up to and including that position t.

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'classes_to_segments'[source]¶

classmethod get_out_data_from_opts(name, sources, num_classes, window, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.ClassesToLengthDistributionLayer(window=15, scale=1.0, **kwargs)[source]¶

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'classes_to_length_distribution'[source]¶

classmethod get_out_data_from_opts(name, sources, window, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.ClassesToLengthDistributionGlobalLayer(window=15, weight_falloff=1.0, target_smoothing=None, min_length=1, broadcast_axis='time', **kwargs)[source]¶

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'classes_to_length_distribution_global'[source]¶

classmethod get_out_data_from_opts(name, sources, window, broadcast_axis='time', **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.SegmentAlignmentLayer(num_classes, window=15, **kwargs)[source]¶

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'segment_alignment'[source]¶

classmethod get_out_data_from_opts(name, sources, num_classes, window, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.UnsegmentInputLayer(**kwargs)[source]¶

Takes the output of SegmentInput (sequences windowed over time and folded into batch-dim) and restores the original batch dimension. The feature dimension contains window * original_features many entries. The entries at time t all correspond to windows ending at time t. The window that started in the same frame comes first, then the window that started in the frame before and so on. This is also the format used for the segmental decoder in RASR.

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'unsegment_input'[source]¶

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.FillUnusedMemoryLayer(fill_value=0.0, **kwargs)[source]¶

Fills all unused entries in the time/batch/feature tensor with a constant

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'fill_unused'[source]¶

classmethod get_out_data_from_opts(name, sources=(), **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.SwapTimeFeatureLayer(**kwargs)[source]¶

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'swap_time_feature'[source]¶

classmethod get_out_data_from_opts(name, sources=(), **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.FlattenTimeLayer(**kwargs)[source]¶

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'flatten_time'[source]¶

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.ApplyLengthDistributionLayer(length_model_scale=1.0, **kwargs)[source]¶

Usually the arguments, when specified in the network dict, are going through transform_config_dict(), before they are passed to here. See TFNetwork.construct_from_dict().

Parameters:

name (str)
network (returnn.tf.network.TFNetwork)
output (Data) – Set a specific output instead of using get_out_data_from_opts()
n_out (NotSpecified|None|int) – output dim
out_dim (returnn.tensor.Dim|None) – output feature dim tag
out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
out_shape (set[returnn.tensor.Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See Data.verify_out_shape().
sources (list[LayerBase]) – via self.transform_config_dict()
in_dim (returnn.tensor.Dim|None) – input feature dim tag
target (str|list[str]|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target). alternatively, this also can be a layer name.
_target_layers (dict[str,LayerBase]|None) – if target.startswith(“layer:”), then this is target -> layer
size_target (str|None) – like target but this is only used to set our output size in case of training
loss (Loss|None) – via transform_config_dict(). Every layer can have one loss (of type Loss), or none loss. In the net dict, it is specified as a string. In TFNetwork, all losses from all layers will be collected. That is what TFUpdater.Updater will use for training.
reuse_params (ReuseParams|None) – if given, will opt reuse the params. see self.var_creation_scope(). See also the name_scope option as an alternative.
name_scope (str|None) – If set, uses this custom (relative) name scope. If it starts with a “/”, it will be the absolute name scope. It should not end with a “/”. It can be empty, in which case it will not consume a new name scope. This can also be used for parameter sharing. The default is the layer name in most cases, but this logic is in get_absolute_name_scope_prefix() and TFNetwork.layer_creation_scope().
param_device (str|None) – e.g. “CPU”, etc. any valid name for tf.device. see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/device_name_utils.h
L2 (float|None) – for constraints
darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
spatial_smoothing (float|None) – see returnn.tf.util.basic.spatial_smoothing_energy()
param_variational_noise (float|None) – adds variational noise to the params during training
param_dropout (float|None) – dropout on params (weight dropout) during training
param_dropout_min_ndim (int|None) – if param dropout is enabled, only use if for params whose ndim >= this. E.g. it might make sense to disable it for bias params or scalars, so set param_dropout_min_ndim=2.
updater_opts (dict[str]|None) – accepts similar opts as TFUpdater, e.g. “optimizer”, “learning_rate”, …
is_output_layer (bool|None) – triggers the construction of this layer in the root net. Inside a RecLayer, it triggers the explicit accumulation of all frames. Also see the need_last option.
only_on_eval (bool) – if True, this layer will only be calculated in eval
only_on_search (bool) – if True, this layer will only be calculated when search is done
copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
batch_norm (bool|dict) – see self.batch_norm()
initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
state – explicitly defines the rec state. initial_state would define the initial state (in the first frame)
need_last (bool) – Inside RecLayer, make sure that we can access the last frame. Similar to ``is_output_layer, but this is specifically about the last frame, i.e. it does not trigger accumulation.
rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us. You would not explicitly set this in a config. This is automatically, internally, via RecLayer.
encapsulate (bool) –
mostly relevant for SubnetworkLayer and similar: If True, all sub layers will be created,

and covered in functions like get_rec_initial_extra_outputs(), and the logic in cls_get_sub_network() will not be used.

If False, the logic in cls_get_sub_network() will be used.
collocate_with (list[str]|None) – in the rec layer, collocate with the specified other layers
trainable (bool) – whether the parameters of this layer will be trained. Default is True. However, if this is inside a subnetwork, all the parent layers must be set to trainable, otherwise the parameters will not be trainable.
custom_param_importer (str|callable|None) – used by set_param_values_by_dict()
register_as_extern_data (str|None) – registers output in network.extern_data
control_dependencies_on_output (None|((LayerBase)->list[tf.Operation])) – This is mostly to perform some checks after the layer output has been computed, before the layer output is used anywhere else. There is also the IdentityLayer with the option control_dependencies.
debug_print_layer_output (None|bool|dict[str]) – same as global config option but per layer
_name (str) – just for internal construction, should be the same as name
_network (returnn.tf.network.TFNetwork) – just for internal construction, should be the same as network
_src_common_search_choices (None|SearchChoices) – set via SearchChoices.translate_to_common_search_beam()

layer_class: Optional[str] = 'apply_length_distribution'[source]¶

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

class returnn.tf.layers.segmental_model.NormalizeLengthScoresLayer(**kwargs)[source]¶

Usually the arguments, when specified in the network dict, are going through transform_config_dict(), before they are passed to here. See TFNetwork.construct_from_dict().

Parameters:

name (str)
network (returnn.tf.network.TFNetwork)
output (Data) – Set a specific output instead of using get_out_data_from_opts()
n_out (NotSpecified|None|int) – output dim
out_dim (returnn.tensor.Dim|None) – output feature dim tag
out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
out_shape (set[returnn.tensor.Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See Data.verify_out_shape().
sources (list[LayerBase]) – via self.transform_config_dict()
in_dim (returnn.tensor.Dim|None) – input feature dim tag
target (str|list[str]|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target). alternatively, this also can be a layer name.
_target_layers (dict[str,LayerBase]|None) – if target.startswith(“layer:”), then this is target -> layer
size_target (str|None) – like target but this is only used to set our output size in case of training
loss (Loss|None) – via transform_config_dict(). Every layer can have one loss (of type Loss), or none loss. In the net dict, it is specified as a string. In TFNetwork, all losses from all layers will be collected. That is what TFUpdater.Updater will use for training.
reuse_params (ReuseParams|None) – if given, will opt reuse the params. see self.var_creation_scope(). See also the name_scope option as an alternative.
name_scope (str|None) – If set, uses this custom (relative) name scope. If it starts with a “/”, it will be the absolute name scope. It should not end with a “/”. It can be empty, in which case it will not consume a new name scope. This can also be used for parameter sharing. The default is the layer name in most cases, but this logic is in get_absolute_name_scope_prefix() and TFNetwork.layer_creation_scope().
param_device (str|None) – e.g. “CPU”, etc. any valid name for tf.device. see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/device_name_utils.h
L2 (float|None) – for constraints
darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
spatial_smoothing (float|None) – see returnn.tf.util.basic.spatial_smoothing_energy()
param_variational_noise (float|None) – adds variational noise to the params during training
param_dropout (float|None) – dropout on params (weight dropout) during training
param_dropout_min_ndim (int|None) – if param dropout is enabled, only use if for params whose ndim >= this. E.g. it might make sense to disable it for bias params or scalars, so set param_dropout_min_ndim=2.
updater_opts (dict[str]|None) – accepts similar opts as TFUpdater, e.g. “optimizer”, “learning_rate”, …
is_output_layer (bool|None) – triggers the construction of this layer in the root net. Inside a RecLayer, it triggers the explicit accumulation of all frames. Also see the need_last option.
only_on_eval (bool) – if True, this layer will only be calculated in eval
only_on_search (bool) – if True, this layer will only be calculated when search is done
copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
batch_norm (bool|dict) – see self.batch_norm()
initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
state – explicitly defines the rec state. initial_state would define the initial state (in the first frame)
need_last (bool) – Inside RecLayer, make sure that we can access the last frame. Similar to ``is_output_layer, but this is specifically about the last frame, i.e. it does not trigger accumulation.
rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us. You would not explicitly set this in a config. This is automatically, internally, via RecLayer.
encapsulate (bool) –
mostly relevant for SubnetworkLayer and similar: If True, all sub layers will be created,

and covered in functions like get_rec_initial_extra_outputs(), and the logic in cls_get_sub_network() will not be used.

If False, the logic in cls_get_sub_network() will be used.
collocate_with (list[str]|None) – in the rec layer, collocate with the specified other layers
trainable (bool) – whether the parameters of this layer will be trained. Default is True. However, if this is inside a subnetwork, all the parent layers must be set to trainable, otherwise the parameters will not be trainable.
custom_param_importer (str|callable|None) – used by set_param_values_by_dict()
register_as_extern_data (str|None) – registers output in network.extern_data
control_dependencies_on_output (None|((LayerBase)->list[tf.Operation])) – This is mostly to perform some checks after the layer output has been computed, before the layer output is used anywhere else. There is also the IdentityLayer with the option control_dependencies.
debug_print_layer_output (None|bool|dict[str]) – same as global config option but per layer
_name (str) – just for internal construction, should be the same as name
_network (returnn.tf.network.TFNetwork) – just for internal construction, should be the same as network
_src_common_search_choices (None|SearchChoices) – set via SearchChoices.translate_to_common_search_beam()

layer_class: Optional[str] = 'normalize_length_scores'[source]¶

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]¶

Parameters:: kwargs – all the same kwargs as for self.__init__()
Returns:: Data template (placeholder not set)
Return type:: Data

returnn.tf.layers.segmental_model¶

`returnn.tf.layers.segmental_model`¶