returnn.tf.layers.base

This module contains the layer base class LayerBase.

class returnn.tf.layers.base.LayerBase(name, network, output, n_out=<class 'returnn.util.basic.NotSpecified'>, out_dim=None, out_type=None, out_shape=None, sources=(), in_dim=None, target=None, _target_layers=None, loss=None, size_target=None, reuse_params=None, name_scope=None, param_device=None, is_output_layer=None, only_on_eval=False, only_on_search=False, copy_output_loss_from_source_idx=None, batch_norm=False, L2=None, darc1=None, spatial_smoothing=0.0, param_variational_noise=None, param_dropout=None, param_dropout_min_ndim=None, updater_opts=None, initial_output=None, state=None, need_last=False, rec_previous_layer=None, encapsulate=False, collocate_with=None, trainable=True, custom_param_importer=None, register_as_extern_data=None, control_dependencies_on_output=None, debug_print_layer_output=None, _network=None, _name=None, _src_common_search_choices=None)[source]

This is the base class for all layers. Every layer by default has a list of source layers sources and defines self.output which is of type Data. It shares some common functionality across all layers, such as explicitly defining the output format, some parameter regularization, and more.

If you want to implement your own layer:

class YourOwnLayer(_ConcatInputLayer):  # e.g. either _ConcatInputLayer or LayerBase as a base
    " some docstring "
    layer_class = "your_layer_name"

    def __init__(self, your_kwarg1, your_opt_kwarg2=None, **kwargs):
        " docstring, document the args! "
        super(YourOwnLayer, self).__init__(**kwargs)
        # Now we need to set self.output, which must be of type :class:`Data`.
        # It is set at this point to whatever we got from `self.get_out_data_from_opts()`,
        # so it is enough if we set self.output.placeholder and self.output.size_placeholder,
        # but we could also reset self.output.
        self.output.placeholder = self.input_data.placeholder + 42  # whatever you want to do
        # If you don't modify the sizes (e.g. sequence-length), just copy the input sizes.
        self.output.size_placeholder = self.input_data.size_placeholder.copy()

    @classmethod
    def get_out_data_from_opts(cls, **kwargs):
        " This is supposed to return a :class:`Data` instance as a template, given the arguments. "
        # example, just the same as the input:
        return get_concat_sources_data_template(kwargs["sources"], name="%s_output" % kwargs["name"])

Usually the arguments, when specified in the network dict, are going through transform_config_dict(), before they are passed to here. See TFNetwork.construct_from_dict().

Parameters:
  • name (str)

  • network (returnn.tf.network.TFNetwork)

  • output (Data) – Set a specific output instead of using get_out_data_from_opts()

  • n_out (NotSpecified|None|int) – output dim

  • out_dim (returnn.tensor.Dim|None) – output feature dim tag

  • out_type (dict[str]) – kwargs for Data class. more explicit than n_out.

  • out_shape (set[returnn.tensor.Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See Data.verify_out_shape().

  • sources (list[LayerBase]) – via self.transform_config_dict()

  • in_dim (returnn.tensor.Dim|None) – input feature dim tag

  • target (str|list[str]|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target). alternatively, this also can be a layer name.

  • _target_layers (dict[str,LayerBase]|None) – if target.startswith(“layer:”), then this is target -> layer

  • size_target (str|None) – like target but this is only used to set our output size in case of training

  • loss (Loss|None) – via transform_config_dict(). Every layer can have one loss (of type Loss), or none loss. In the net dict, it is specified as a string. In TFNetwork, all losses from all layers will be collected. That is what TFUpdater.Updater will use for training.

  • reuse_params (ReuseParams|None) – if given, will opt reuse the params. see self.var_creation_scope(). See also the name_scope option as an alternative.

  • name_scope (str|None) – If set, uses this custom (relative) name scope. If it starts with a “/”, it will be the absolute name scope. It should not end with a “/”. It can be empty, in which case it will not consume a new name scope. This can also be used for parameter sharing. The default is the layer name in most cases, but this logic is in get_absolute_name_scope_prefix() and TFNetwork.layer_creation_scope().

  • param_device (str|None) – e.g. “CPU”, etc. any valid name for tf.device. see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/device_name_utils.h

  • L2 (float|None) – for constraints

  • darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468

  • spatial_smoothing (float|None) – see returnn.tf.util.basic.spatial_smoothing_energy()

  • param_variational_noise (float|None) – adds variational noise to the params during training

  • param_dropout (float|None) – dropout on params (weight dropout) during training

  • param_dropout_min_ndim (int|None) – if param dropout is enabled, only use if for params whose ndim >= this. E.g. it might make sense to disable it for bias params or scalars, so set param_dropout_min_ndim=2.

  • updater_opts (dict[str]|None) – accepts similar opts as TFUpdater, e.g. “optimizer”, “learning_rate”, …

  • is_output_layer (bool|None) – triggers the construction of this layer in the root net. Inside a RecLayer, it triggers the explicit accumulation of all frames. Also see the need_last option.

  • only_on_eval (bool) – if True, this layer will only be calculated in eval

  • only_on_search (bool) – if True, this layer will only be calculated when search is done

  • copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source

  • batch_norm (bool|dict) – see self.batch_norm()

  • initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()

  • state – explicitly defines the rec state. initial_state would define the initial state (in the first frame)

  • need_last (bool) – Inside RecLayer, make sure that we can access the last frame. Similar to ``is_output_layer, but this is specifically about the last frame, i.e. it does not trigger accumulation.

  • rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us. You would not explicitly set this in a config. This is automatically, internally, via RecLayer.

  • encapsulate (bool) –

    mostly relevant for SubnetworkLayer and similar: If True, all sub layers will be created,

    and covered in functions like get_rec_initial_extra_outputs(), and the logic in cls_get_sub_network() will not be used.

    If False, the logic in cls_get_sub_network() will be used.

  • collocate_with (list[str]|None) – in the rec layer, collocate with the specified other layers

  • trainable (bool) – whether the parameters of this layer will be trained. Default is True. However, if this is inside a subnetwork, all the parent layers must be set to trainable, otherwise the parameters will not be trainable.

  • custom_param_importer (str|callable|None) – used by set_param_values_by_dict()

  • register_as_extern_data (str|None) – registers output in network.extern_data

  • control_dependencies_on_output (None|((LayerBase)->list[tf.Operation])) – This is mostly to perform some checks after the layer output has been computed, before the layer output is used anywhere else. There is also the IdentityLayer with the option control_dependencies.

  • debug_print_layer_output (None|bool|dict[str]) – same as global config option but per layer

  • _name (str) – just for internal construction, should be the same as name

  • _network (returnn.tf.network.TFNetwork) – just for internal construction, should be the same as network

  • _src_common_search_choices (None|SearchChoices) – set via SearchChoices.translate_to_common_search_beam()

layer_class: Optional[str] = None[source]
allow_inf_in_output = False[source]
recurrent = False[source]
post_init(layer_desc)[source]

This gets called right after self.__init__().

Parameters:

layer_desc (dict[str]) – kwargs as they are passed to self.__init__

classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters:

kwargs – all the same kwargs as for self.__init__()

Returns:

Data template (placeholder not set)

Return type:

Data

classmethod fixup_out_data(output, network, out_shape=None, **kwargs)[source]

This is called after get_out_data_from_opts, to fixup incomplete information. E.g. we can patch batch or beam information here but maybe also other things.

Other layer classes might overwrite this but then should call this super method. Usually this should not be needed though.

Parameters:
  • output (Data)

  • network (returnn.tf.network.TFNetwork)

  • out_shape (set[Dim|_MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See Data.verify_out_shape().

Return type:

Data

classmethod get_global_layer_list()[source]
Return type:

list[LayerBase]

classmethod get_recent_layer()[source]
Return type:

LayerBase

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace

  • network (returnn.tf.network.TFNetwork)

  • get_layer (returnn.tf.network.GetLayer|((str)->LayerBase)) – function to get or construct another layer The name get_layer might be misleading, as this should return an existing layer, or construct it if it does not exist yet. network.get_layer would just return an existing layer.

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

get_full_ctx_name()[source]
Returns:

name w.r.t. root ctx network

classmethod cls_get_tf_scope_name(name)[source]
Parameters:

name (str) – layer name

Returns:

valid scope name, might be just name. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX

Return type:

str

classmethod cls_setup_scope(name, name_scope=None, **_kwargs)[source]
Parameters:
  • name (str)

  • name_scope (str|None)

  • _kwargs – other layer kwargs after being transformed

property tf_scope_name[source]
Return type:

str

Returns:

normally just self.name, but make it a valid TF scope name. this is meant mostly to extend TF names. see func:get_base_absolute_name_scope_prefix otherwise.

get_base_absolute_name_scope_prefix()[source]
Returns:

e.g. “output/”, always with “/” at end, or “”. this is for the TF name scope or variable scope

Return type:

str

get_absolute_name_scope_prefix()[source]
Returns:

e.g. “output/”, always with “/” at end, or “”. this is for the TF name scope or variable scope. This is the same as get_base_absolute_name_scope_prefix() in most cases, but some layers like RecLayer extend this by an additional postfix.

Return type:

str

get_absolute_name()[source]
Returns:

e.g. “output” or “subnet/output”. This is mostly for representation. See also get_absolute_name_scope_prefix().

Return type:

str

is_output_layer()[source]

Some code differs between an output layer and other layers. It is a bit arbitrary what we define as output layer. This should be consistent with TFNetwork.construct_from_dict().

Return type:

bool

get_dep_layers()[source]
Returns:

list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc.

Return type:

list[LayerBase]

classmethod cls_get_sub_network(name, network, layer_desc)[source]

A layer class can override this to return a custom Subnetwork, which just sets another namespace (and possibly variable sharing) for contained layers but otherwise shares the same construction logic via root network TFNetwork.construct_layer().

When not overriding this, a layer still can have sub layers via LayerBase.get_sub_layer(), but they belong to the root layer (collocated) and can not be decoupled.

Parameters:
Return type:

returnn.tf.network.Subnetwork|None

get_sub_layer(layer_name)[source]

The default behavior for any layer is to return None. Returned layers belong to the root layer (self).

Also see LayerBase.cls_get_sub_network().

Also see get_available_sub_layer_names().

Parameters:

layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)

Returns:

the sub_layer addressed in layer_name or None if no sub_layer exists

Return type:

LayerBase|None

classmethod get_available_sub_layer_names(parent_layer_kwargs)[source]
Parameters:

parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())

Returns:

list of layer names which can be accessed via get_sub_layer()

Return type:

list[str]

classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]

Called by _TemplateLayer.get_sub_layer(). Gets a Data template for the sub-layer with name ‘layer_name’. Also returns the network the sub-layer is in and the class type of the sub-layer. There is no good default behaviour here, as this heavily depends on how the current layer uses sub-layers.

Parameters:
  • layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)

  • parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())

Returns:

Data template, class type of sub-layer, layer opts (transformed)

Return type:

(Data, type, dict[str])|None

get_sub_networks()[source]
Returns:

All subnetworks, including those which might be in a different ctx. If this returns a non-empty list, we expect that all layers via get_sub_layers can be reached via the subnetworks.

Return type:

list[returnn.tf.network.TFNetwork]

get_sub_layers()[source]
Returns:

All (direct) (non-temporary) sub layers, including those which might be in a different ctx. This is mostly intended to collect params.

Return type:

list[LayerBase]

get_search_choices()[source]
Return type:

SearchChoices|None

get_search_beam_size()[source]
Returns:

beam size if there was a choice layer and we do search

Return type:

int|None

get_normalized_layer()[source]
Returns:

e.g. if prev layer in RecLayer, return current layer

Return type:

LayerBase

get_batch_dim()[source]

The batch dim by this layer, not taken from our output placeholder but calculated. Normally it is self.network.get_batch_dim() but if we do search and there was a choice layer, it it multiplied by the beam size.

Returns:

batch dim * beam size

Return type:

tf.Tensor|int

get_batch_info()[source]
Return type:

returnn.tf.util.data.BatchInfo

var_creation_scope(**kwargs)[source]

This takes care of setting up a scope where variables can be created. This handles multiple things:

  • the param sharing logic, to reuse existing variables from elsewhere

  • variational noise and param weight dropout

  • Note: default_control_flow_ctx() is not needed for tf.get_variable. But it might be needed for other code which uses custom inits and tf.Variable, e.g. tf.random.Generator. However, always using this could be a problem if we use other input tensors inside this scope, so we do not enable this here.

Parameters:

kwargs – passed to variable_scope

Returns:

yields the variable_scope

add_param(param, custom_update=None, trainable=None, saveable=None, axes_split_info=None, non_critical_for_restore=False)[source]
Parameters:
  • param (tf.Variable|tf.Tensor)

  • custom_update (None|CustomUpdate) – will be applied in training, instead of taking the gradient

  • trainable (bool|None)

  • saveable (bool|None)

  • axes_split_info (list[list[int]]|None) – e.g. [[n],[n]*4] for LSTM matrices

  • non_critical_for_restore (bool) – if True, and it cannot be found in a checkpoint, it will not be an error

Returns:

param

:rtype tf.Variable

set_param_values_by_dict(values_dict, session, ignore_wrong_shape=False, copy_param_mode=None)[source]
Parameters:
  • values_dict (dict[str,numpy.ndarray])

  • ignore_wrong_shape (bool)

  • copy_param_mode (str|None)

  • session (tf.compat.v1.Session)

get_param_values_dict(session) Dict[str, ndarray][source]
Parameters:

session (tf.compat.v1.Session)

Returns:

dict name -> values

get_saveable_params_dict()[source]
Returns:

params and saveable_param_replace resolved

Return type:

dict[str,tf.Variable|tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject]

classmethod get_losses(name, network, output, loss=None, reduce_func=None, layer=None, **kwargs)[source]

Losses will get constructed here. This gets called inside a loss name scope of the layer. When overriding this, make sure that it works both with layer set and unset.

Parameters:
  • name (str) – layer name

  • network (returnn.tf.network.TFNetwork)

  • loss (Loss|None) – argument just as for __init__

  • output (Data) – the output (template) for the layer

  • layer (LayerBase|None) – The real layer instance, if it exists at the current point. If not given, init() must be called at a later point.

  • reduce_func (((tf.Tensor)->tf.Tensor)|None) – if given, will overwrite the reduce func for the loss. By default, every loss_value and error_value is a scalar (sum or average over the batches, and over the frames for frame-wise losses). However, if you provide reduce_func = returnn.tf.util.basic.identity, you can get the unreduced tensor.

  • kwargs – all the remaining __init__ args

Returns:

the losses defined by this layer

Return type:

list[returnn.tf.network.LossHolder]

get_losses_initialized(reduce_func=None)[source]

As self.get_losses, but here we return them all initialized (i.e. the layer is set). You should not override this method but rather get_losses().

Parameters:

reduce_func (((tf.Tensor)->tf.Tensor)|None) – as in get_losses

Returns:

the losses defined by this layer

Return type:

list[returnn.tf.network.LossHolder]

get_params_l2_norm()[source]
Returns:

scalar

Return type:

tf.Tensor

get_output_spatial_smoothing_energy()[source]
Returns:

scalar. see returnn.tf.util.basic.spatial_smoothing_energy()

Return type:

tf.Tensor

get_darc1()[source]

DARC1, simplified Directly Approximately Regularizing Complexity (DARC), via Generalization in Deep Learning, https://arxiv.org/abs/1710.05468

Returns:

scalar

Return type:

tf.Tensor

get_constraints_value()[source]
Returns:

None or scalar

Return type:

tf.Tensor|None

batch_norm(data, use_shift=True, use_std=True, use_sample=0.0, force_sample=False, momentum=<class 'returnn.util.basic.NotSpecified'>, epsilon=0.001, update_sample_only_in_training=<class 'returnn.util.basic.NotSpecified'>, delay_sample_update=<class 'returnn.util.basic.NotSpecified'>, param_version=<class 'returnn.util.basic.NotSpecified'>, gamma_init=1.0, beta_init=0.0, masked_time=<class 'returnn.util.basic.NotSpecified'>)[source]
Parameters:
  • data (Data)

  • use_shift (bool)

  • use_std (bool)

  • use_sample (float) – defaults to 0.0 which is used in training

  • force_sample (bool) – even in eval, use the use_sample factor

  • momentum (float) – for the running average of sample_mean and sample_std

  • update_sample_only_in_training (bool)

  • delay_sample_update (bool)

  • param_version (int) – 0 or 1 or 2

  • epsilon (float)

  • gamma_init (str|float) – see returnn.tf.util.basic.get_initializer(), for the scale

  • beta_init (str|float) – see returnn.tf.util.basic.get_initializer(), for the mean

  • masked_time (bool) – flatten and mask input tensor

Return type:

tf.Tensor

https://arxiv.org/abs/1502.03167

With our default settings:

  • In training: use_sample=0, i.e. not using running average, using current batch mean/var.

  • Not in training (e.g. eval): use_sample=1, i.e. using running average, not using current batch mean/var.

  • The running average includes the statistics of the current batch.

  • The running average is also updated when not training.

Also see:

tf.nn.batch_normalization() https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/batch_norm.py

get_hidden_state()[source]

If this is a recurrent layer, this would return the hidden state. This is used e.g. for the RnnCellLayer class.

Return type:

tf.Tensor | list[tf.Tensor] | None

Returns:

optional tensor(s) with shape (time, batch, dim)

get_last_hidden_state(key)[source]

If this is a recurrent layer, this would return the last hidden state. Otherwise, we return None.

Parameters:

key (int|str|None) – also the special key “*”

Return type:

tf.Tensor | None

Returns:

optional tensor with shape (batch, dim)

post_process_final_rec_vars_outputs(rec_vars_outputs, seq_len)[source]
Parameters:
  • rec_vars_outputs (dict[str,tf.Tensor])

  • seq_len (tf.Tensor) – shape (batch,)

Return type:

dict[str,tf.Tensor]

classmethod get_rec_initial_output(batch_dim, name, output, rec_layer, initial_output=None, **kwargs)[source]

If this layer is used inside a recurrent layer, this function specifies the output of frame t=-1, if it is needed. As arguments, we get the usual layer arguments. batch_dim is added because it might be special because of beam search.

Note: This could maybe share code with RnnCellLayer.get_rec_initial_state().

Parameters:
  • batch_dim (tf.Tensor) – including beam size in beam search

  • name (str) – layer name

  • output (Data) – template

  • rec_layer (returnn.tf.layers.rec.RecLayer)

  • initial_output (str|float|int|tf.Tensor|None)

Return type:

tf.Tensor

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, **kwargs)[source]
Parameters:
Return type:

dict[str,tf.Tensor]

classmethod get_rec_initial_extra_outputs_shape_invariants(rec_layer, **kwargs)[source]
Parameters:

rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase|None) – for the scope

Returns:

optional shapes for the tensors by get_rec_initial_extra_outputs

Return type:

dict[str,tf.TensorShape]

class returnn.tf.layers.base.InternalLayer(output: Tensor, debug_type_name: str | None = None, **kwargs)[source]

This is not supposed to be used by the user. It is used by some code to construct a wrapper layer or so.

Parameters:
  • output

  • debug_type_name – just for repr

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
class returnn.tf.layers.base.DataNotAvailableLayer(layer_class, layer_desc, **kwargs)[source]

This is a dummy layer that is created when the output template is flagged “not available for inference”. The output template should be passed to the constructor to correctly forward the information in case any dependent output is exported with “register_as_extern_data”.

See returnn.tf.network._create_layer()

Parameters:
  • layer_class (type[LayerBase])

  • layer_desc (dict[str])

get_sub_layer(layer_name)[source]
Parameters:

layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)

Return type:

LayerBase|None

class returnn.tf.layers.base.WrappedInternalLayer(base_layer, sources=None, **kwargs)[source]

This is not supposed to be used by the user. Like InternalLayer, only intended for internal usage. This layer is supposed to logically wrap another layer.

Parameters:
  • base_layer (LayerBase) – the layer which we are wrapping

  • sources (list[LayerBase]|None) – by default [base_layer]. overwrite to explicitly specify the layer deps

get_base_absolute_name_scope_prefix()[source]
Return type:

str

get_absolute_name_scope_prefix()[source]
Return type:

str

class returnn.tf.layers.base.ReuseParams(reuse_layer=None, map=None, custom=None, auto_create_missing=False, layer_output=None, shape=None)[source]

This is for parameter sharing, i.e. reusing existing tf.Variable objects in a new layer, instead of creating new variables. ReuseParams.from_config_dict() will be called via LayerBase.transform_config_dict().

Parameters:
classmethod from_config_dict(opts, network, get_layer)[source]

This will be called via LayerBase.transform_config_dict() on the layer option “reuse_params”.

Parameters:
  • opts (str|dict[str]|None) –

    If None, we will return None. If str, it will be interpret as a layer name. If dict, you can specify:

    ”reuse_layer”: layer name “map”: dict where the keys are parameter names, and the values can be:

    A str would be interpret as a layer name. None would be interpret as the option auto_create_missing. A dict would specify ReuseParams.__init__() options.

    The option reuse_layer would be specified as a str, and represents a layer name.

  • network (returnn.tf.network.TFNetwork)

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

Return type:

ReuseParams|None

class LazyLayerResolver(layer_name, network, get_layer)[source]

Unfortunately this is a bit tricky and difficult to do right. We want to support it because it can happen that e.g. in training, this is layer resolving is not needed, and then in search, it is needed, due to different dependencies. See test_reuse_params_map_custom_dep_loop() for an example. The params depend on a layer which is not constructed yet and cannot be constructed yet because of a dependency loop. Thus, here we again try to create it, and if we still get the dependency loop, we create the reused-params-layer based on dummy inputs, such that the variables/parameters get created and can be used now. Then, later, we are going to recreate the reused-params-layer.

Parameters:
get_layer()[source]
Return type:

LayerBase

create_dummy_layer()[source]
Return type:

LayerBase

property reuse_layer[source]
Return type:

LayerBase|None

get_variable_scope(base_layer, **kwargs)[source]
Parameters:
  • base_layer (LayerBase)

  • kwargs – passed to tf.compat.v1.variable_scope

Return type:

tf.compat.v1.VariableScope

variable_custom_getter(base_layer, name, shape, dtype, getter, **kwargs)[source]

By TF docs, from _VariableStore.get_variable(): Callable that takes as a first argument the true getter, and allows overwriting the internal get_variable method. The signature of custom_getter should match that of this method, but the most future-proof version will allow for changes: def custom_getter(getter, *args, **kwargs). Direct access to all get_variable parameters is also allowed: def custom_getter(getter, name, *args, **kwargs). A simple identity custom getter that simply creates variables with modified names is: ```python def custom_getter(getter, name, *args, **kwargs):

return getter(name + ‘_suffix’, *args, **kwargs)

``` In addition, we get the argument base_scope_name, via self.get_variable_scope().

Parameters:
  • base_layer (LayerBase) – we expect that this is the prefix of name

  • name (str) – absolute param name

  • shape (tuple[int]|list[int])

  • dtype (tensorflow.DType)

  • getter ((...)->tf.Variable)

Return type:

tf.Variable|tf.Tensor

class returnn.tf.layers.base.SearchChoices(owner, beam_size, is_decided=False, keep_raw=False)[source]

In beam search, after expanding the beam and then selecting the N best (beam) (see ChoiceLayer), when doing this multiple times, we need to keep reference where each beam came from, and what the current score is, etc. Also we could have multiple different such expansions & prunes via different ChoiceLayer. This is what we keep track here.

Parameters:
  • owner (LayerBase)

  • beam_size (int)

  • is_decided (bool) – by DecideLayer

  • keep_raw (bool) – by DecideKeepBeamLayer

property src_layer[source]
Returns:

The layer where we had the last search choices.

Return type:

LayerBase

set_beam_from_own_rec()[source]

Assumes we have set self.owner, and uses those rec vars to set the beam scores.

set_beam_from_rec(rev_vars_outputs)[source]
Parameters:

rev_vars_outputs (dict[str,tf.Tensor]) – e.g. via ChoiceLayer

set_src_beams(src_beam_idxs)[source]
Parameters:

src_beam_idxs (tf.Tensor) – source beam index, (batch, beam)

set_beam_scores(scores)[source]
Parameters:

scores (tf.Tensor) – (batch, beam) -> log score

get_src_choices_seq()[source]
Returns:

all SearchChoices we depend on up to the root, including and starting with self

Return type:

list[SearchChoices]

get_beam_info()[source]
Return type:

returnn.tf.util.data.SearchBeam|None

static compare(self, other)[source]

Also see TFNetwork.get_search_choices.compare_layer(), which is basically the same.

Parameters:
Returns:

0 if equal, -1 if we are smaller, else 1

Return type:

int

translate_to_this_search_beam(sources)[source]
Parameters:

sources (LayerBase|list[LayerBase]|dict[str,LayerBase|object]|tuple[LayerBase|object]|T)

Returns:

sources but all layers transformed when needed

Return type:

T

classmethod translate_to_common_search_beam(layer_desc)[source]
Parameters:

layer_desc (list[LayerBase]|dict[str,LayerBase|object])

Returns:

sources but all layers transformed when needed

Return type:

list[LayerBase]|dict[str,LayerBase|object]

class returnn.tf.layers.base.Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, custom_inv_norm_factor=None, scale=1.0, _check_output_before_softmax=None)[source]

Base class for all losses.

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = None[source]
recurrent = False[source]
need_target = True[source]
reduce_func(loss)[source]

Reduces the frames. Currently the sum, and we do averaging later. We might change this logic at some point. Also, some code overwrites this function externally, e.g. with returnn.tf.util.basic.identity, to not do reducing.

Parameters:

loss (tf.Tensor) – e.g. (batch*time,), or (time_flat,), or (batch*time,dim), etc

Returns:

by default just a scalar. but this can be overwritten, to not reduce

Return type:

tf.Tensor

reduce_to_batch(loss, normalize)[source]
Parameters:
  • loss (tf.Tensor) – e.g. (batch*time,), or (time_flat,), or (batch*time,dim), etc

  • normalize (bool) – reduce mean instead of reduce sum

Returns:

(batch,)

Return type:

tf.Tensor

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts

  • network (returnn.tf.network.TFNetwork)

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by LayerBase.transform_config_dict.

init_by_layer(layer, layer_output_template=None)[source]
Parameters:
  • layer (LayerBase|None)

  • layer_output_template (Data|None) – maybe alternative template

init(output, output_with_activation=None, target=None, layer=None)[source]
Parameters:
  • output (Data) – generated output

  • output_with_activation (OutputWithActivation|None)

  • target (Data) – reference target from dataset

  • layer (LayerBase|None)

get_error()[source]
Returns:

frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)

Return type:

tf.Tensor

get_value()[source]
Returns:

self.reduce_func(loss), which is usually a scalar with the default as if does tf.reduce_sum. float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info().

Return type:

tf.Tensor|None

get_normalization_factor()[source]
Returns:

factor as a float scalar, usually 1.0 / num_frames. see self.reduce_func.

Return type:

tf.Tensor

classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters:

target_dim (returnn.tensor.Dim)

Returns:

normally just the same as target_dim. e.g. for CTC, we would add 1 for the blank label

Return type:

returnn.tensor.Dim

classmethod get_default_target(extern_data)[source]
Parameters:

extern_data (TFNetwork.ExternData)

Returns:

default target name, or None if this loss does not have a target

Return type:

str|None