returnn.tf.layers.base
¶
This module contains the layer base class LayerBase
.
- class returnn.tf.layers.base.LayerBase(name, network, output, n_out=<class 'returnn.util.basic.NotSpecified'>, out_dim=None, out_type=None, out_shape=None, sources=(), in_dim=None, target=None, _target_layers=None, loss=None, size_target=None, reuse_params=None, name_scope=None, param_device=None, is_output_layer=None, only_on_eval=False, only_on_search=False, copy_output_loss_from_source_idx=None, batch_norm=False, L2=None, darc1=None, spatial_smoothing=0.0, param_variational_noise=None, param_dropout=None, param_dropout_min_ndim=None, updater_opts=None, initial_output=None, state=None, need_last=False, rec_previous_layer=None, encapsulate=False, collocate_with=None, trainable=True, custom_param_importer=None, register_as_extern_data=None, control_dependencies_on_output=None, debug_print_layer_output=None, _network=None, _name=None, _src_common_search_choices=None)[source]¶
This is the base class for all layers. Every layer by default has a list of source layers sources and defines self.output which is of type
Data
. It shares some common functionality across all layers, such as explicitly defining the output format, some parameter regularization, and more.If you want to implement your own layer:
class YourOwnLayer(_ConcatInputLayer): # e.g. either _ConcatInputLayer or LayerBase as a base " some docstring " layer_class = "your_layer_name" def __init__(self, your_kwarg1, your_opt_kwarg2=None, **kwargs): " docstring, document the args! " super(YourOwnLayer, self).__init__(**kwargs) # Now we need to set self.output, which must be of type :class:`Data`. # It is set at this point to whatever we got from `self.get_out_data_from_opts()`, # so it is enough if we set self.output.placeholder and self.output.size_placeholder, # but we could also reset self.output. self.output.placeholder = self.input_data.placeholder + 42 # whatever you want to do # If you don't modify the sizes (e.g. sequence-length), just copy the input sizes. self.output.size_placeholder = self.input_data.size_placeholder.copy() @classmethod def get_out_data_from_opts(cls, **kwargs): " This is supposed to return a :class:`Data` instance as a template, given the arguments. " # example, just the same as the input: return get_concat_sources_data_template(kwargs["sources"], name="%s_output" % kwargs["name"])
Usually the arguments, when specified in the network dict, are going through
transform_config_dict()
, before they are passed to here. SeeTFNetwork.construct_from_dict()
.- Parameters:
name (str)
network (returnn.tf.network.TFNetwork)
output (Data) – Set a specific output instead of using
get_out_data_from_opts()
n_out (NotSpecified|None|int) – output dim
out_dim (returnn.tensor.Dim|None) – output feature dim tag
out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
out_shape (set[returnn.tensor.Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See
Data.verify_out_shape()
.sources (list[LayerBase]) – via self.transform_config_dict()
in_dim (returnn.tensor.Dim|None) – input feature dim tag
target (str|list[str]|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target). alternatively, this also can be a layer name.
_target_layers (dict[str,LayerBase]|None) – if target.startswith(“layer:”), then this is target -> layer
size_target (str|None) – like target but this is only used to set our output size in case of training
loss (Loss|None) – via
transform_config_dict()
. Every layer can have one loss (of typeLoss
), or none loss. In the net dict, it is specified as a string. InTFNetwork
, all losses from all layers will be collected. That is whatTFUpdater.Updater
will use for training.reuse_params (ReuseParams|None) – if given, will opt reuse the params. see
self.var_creation_scope()
. See also thename_scope
option as an alternative.name_scope (str|None) – If set, uses this custom (relative) name scope. If it starts with a “/”, it will be the absolute name scope. It should not end with a “/”. It can be empty, in which case it will not consume a new name scope. This can also be used for parameter sharing. The default is the layer name in most cases, but this logic is in
get_absolute_name_scope_prefix()
andTFNetwork.layer_creation_scope()
.param_device (str|None) – e.g. “CPU”, etc. any valid name for tf.device. see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/device_name_utils.h
L2 (float|None) – for constraints
darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
spatial_smoothing (float|None) – see
returnn.tf.util.basic.spatial_smoothing_energy()
param_variational_noise (float|None) – adds variational noise to the params during training
param_dropout (float|None) – dropout on params (weight dropout) during training
param_dropout_min_ndim (int|None) – if param dropout is enabled, only use if for params whose ndim >= this. E.g. it might make sense to disable it for bias params or scalars, so set param_dropout_min_ndim=2.
updater_opts (dict[str]|None) – accepts similar opts as TFUpdater, e.g. “optimizer”, “learning_rate”, …
is_output_layer (bool|None) – triggers the construction of this layer in the root net. Inside a
RecLayer
, it triggers the explicit accumulation of all frames. Also see theneed_last
option.only_on_eval (bool) – if True, this layer will only be calculated in eval
only_on_search (bool) – if True, this layer will only be calculated when search is done
copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
batch_norm (bool|dict) – see self.batch_norm()
initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
state – explicitly defines the rec state. initial_state would define the initial state (in the first frame)
need_last (bool) – Inside
RecLayer
, make sure that we can access the last frame. Similar to ``is_output_layer, but this is specifically about the last frame, i.e. it does not trigger accumulation.rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us. You would not explicitly set this in a config. This is automatically, internally, via
RecLayer
.encapsulate (bool) –
mostly relevant for SubnetworkLayer and similar: If True, all sub layers will be created,
and covered in functions like
get_rec_initial_extra_outputs()
, and the logic incls_get_sub_network()
will not be used.If False, the logic in
cls_get_sub_network()
will be used.collocate_with (list[str]|None) – in the rec layer, collocate with the specified other layers
trainable (bool) – whether the parameters of this layer will be trained. Default is True. However, if this is inside a subnetwork, all the parent layers must be set to trainable, otherwise the parameters will not be trainable.
custom_param_importer (str|callable|None) – used by
set_param_values_by_dict()
register_as_extern_data (str|None) – registers output in network.extern_data
control_dependencies_on_output (None|((LayerBase)->list[tf.Operation])) – This is mostly to perform some checks after the layer output has been computed, before the layer output is used anywhere else. There is also the
IdentityLayer
with the optioncontrol_dependencies
.debug_print_layer_output (None|bool|dict[str]) – same as global config option but per layer
_name (str) – just for internal construction, should be the same as
name
_network (returnn.tf.network.TFNetwork) – just for internal construction, should be the same as
network
_src_common_search_choices (None|SearchChoices) – set via
SearchChoices.translate_to_common_search_beam()
- post_init(layer_desc)[source]¶
This gets called right after self.__init__().
- Parameters:
layer_desc (dict[str]) – kwargs as they are passed to self.__init__
- classmethod get_out_data_from_opts(**kwargs)[source]¶
Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.
- Parameters:
kwargs – all the same kwargs as for self.__init__()
- Returns:
Data template (placeholder not set)
- Return type:
Data
- classmethod fixup_out_data(output, network, out_shape=None, **kwargs)[source]¶
This is called after get_out_data_from_opts, to fixup incomplete information. E.g. we can patch batch or beam information here but maybe also other things.
Other layer classes might overwrite this but then should call this super method. Usually this should not be needed though.
- Parameters:
output (Data)
network (returnn.tf.network.TFNetwork)
out_shape (set[Dim|_MarkedDim]|tuple|list|None) – verifies the output shape (dim tags). See
Data.verify_out_shape()
.
- Return type:
Data
- classmethod transform_config_dict(d, network, get_layer)[source]¶
- Parameters:
d (dict[str]) – will modify inplace
network (returnn.tf.network.TFNetwork)
get_layer (returnn.tf.network.GetLayer|((str)->LayerBase)) – function to get or construct another layer The name get_layer might be misleading, as this should return an existing layer, or construct it if it does not exist yet. network.get_layer would just return an existing layer.
Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by
TFNetwork.construct_from_dict()
. It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list ofLayerBase
instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.
- classmethod cls_get_tf_scope_name(name)[source]¶
- Parameters:
name (str) – layer name
- Returns:
valid scope name, might be just name. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX
- Return type:
str
- classmethod cls_setup_scope(name, name_scope=None, **_kwargs)[source]¶
- Parameters:
name (str)
name_scope (str|None)
_kwargs – other layer kwargs after being transformed
- property tf_scope_name[source]¶
- Return type:
str
- Returns:
normally just self.name, but make it a valid TF scope name. this is meant mostly to extend TF names. see func:get_base_absolute_name_scope_prefix otherwise.
- get_base_absolute_name_scope_prefix()[source]¶
- Returns:
e.g. “output/”, always with “/” at end, or “”. this is for the TF name scope or variable scope
- Return type:
str
- get_absolute_name_scope_prefix()[source]¶
- Returns:
e.g. “output/”, always with “/” at end, or “”. this is for the TF name scope or variable scope. This is the same as
get_base_absolute_name_scope_prefix()
in most cases, but some layers likeRecLayer
extend this by an additional postfix.- Return type:
str
- get_absolute_name()[source]¶
- Returns:
e.g. “output” or “subnet/output”. This is mostly for representation. See also
get_absolute_name_scope_prefix()
.- Return type:
str
- is_output_layer()[source]¶
Some code differs between an output layer and other layers. It is a bit arbitrary what we define as output layer. This should be consistent with
TFNetwork.construct_from_dict()
.- Return type:
bool
- get_dep_layers()[source]¶
- Returns:
list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc.
- Return type:
list[LayerBase]
- classmethod cls_get_sub_network(name, network, layer_desc)[source]¶
A layer class can override this to return a custom
Subnetwork
, which just sets another namespace (and possibly variable sharing) for contained layers but otherwise shares the same construction logic via root networkTFNetwork.construct_layer()
.When not overriding this, a layer still can have sub layers via
LayerBase.get_sub_layer()
, but they belong to the root layer (collocated) and can not be decoupled.- Parameters:
name (str)
network (returnn.tf.network.TFNetwork)
layer_desc (dict[str])
- Return type:
- get_sub_layer(layer_name)[source]¶
The default behavior for any layer is to return None. Returned layers belong to the root layer (self).
Also see
LayerBase.cls_get_sub_network()
.Also see
get_available_sub_layer_names()
.- Parameters:
layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)
- Returns:
the sub_layer addressed in layer_name or None if no sub_layer exists
- Return type:
LayerBase|None
- classmethod get_available_sub_layer_names(parent_layer_kwargs)[source]¶
- Parameters:
parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())
- Returns:
list of layer names which can be accessed via
get_sub_layer()
- Return type:
list[str]
- classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]¶
Called by _TemplateLayer.get_sub_layer(). Gets a Data template for the sub-layer with name ‘layer_name’. Also returns the network the sub-layer is in and the class type of the sub-layer. There is no good default behaviour here, as this heavily depends on how the current layer uses sub-layers.
- Parameters:
layer_name (str) – name of the sub_layer (right part of ‘/’ separated path)
parent_layer_kwargs (dict[str]) – kwargs for the parent layer (as kwargs in cls.get_out_data_from_opts())
- Returns:
Data template, class type of sub-layer, layer opts (transformed)
- Return type:
(Data, type, dict[str])|None
- get_sub_networks()[source]¶
- Returns:
All subnetworks, including those which might be in a different ctx. If this returns a non-empty list, we expect that all layers via get_sub_layers can be reached via the subnetworks.
- Return type:
- get_sub_layers()[source]¶
- Returns:
All (direct) (non-temporary) sub layers, including those which might be in a different ctx. This is mostly intended to collect params.
- Return type:
list[LayerBase]
- get_search_choices()[source]¶
- Return type:
SearchChoices|None
- get_search_beam_size()[source]¶
- Returns:
beam size if there was a choice layer and we do search
- Return type:
int|None
- get_normalized_layer()[source]¶
- Returns:
e.g. if prev layer in
RecLayer
, return current layer- Return type:
- get_batch_dim()[source]¶
The batch dim by this layer, not taken from our output placeholder but calculated. Normally it is self.network.get_batch_dim() but if we do search and there was a choice layer, it it multiplied by the beam size.
- Returns:
batch dim * beam size
- Return type:
tf.Tensor|int
- var_creation_scope(**kwargs)[source]¶
This takes care of setting up a scope where variables can be created. This handles multiple things:
the param sharing logic, to reuse existing variables from elsewhere
variational noise and param weight dropout
Note:
default_control_flow_ctx()
is not needed for tf.get_variable. But it might be needed for other code which uses custom inits and tf.Variable, e.g. tf.random.Generator. However, always using this could be a problem if we use other input tensors inside this scope, so we do not enable this here.
- Parameters:
kwargs – passed to variable_scope
- Returns:
yields the variable_scope
- add_param(param, custom_update=None, trainable=None, saveable=None, axes_split_info=None, non_critical_for_restore=False)[source]¶
- Parameters:
param (tf.Variable|tf.Tensor)
custom_update (None|CustomUpdate) – will be applied in training, instead of taking the gradient
trainable (bool|None)
saveable (bool|None)
axes_split_info (list[list[int]]|None) – e.g. [[n],[n]*4] for LSTM matrices
non_critical_for_restore (bool) – if True, and it cannot be found in a checkpoint, it will not be an error
- Returns:
param
:rtype tf.Variable
- set_param_values_by_dict(values_dict, session, ignore_wrong_shape=False, copy_param_mode=None)[source]¶
- Parameters:
values_dict (dict[str,numpy.ndarray])
ignore_wrong_shape (bool)
copy_param_mode (str|None)
session (tf.compat.v1.Session)
- get_param_values_dict(session) Dict[str, ndarray] [source]¶
- Parameters:
session (tf.compat.v1.Session)
- Returns:
dict name -> values
- get_saveable_params_dict()[source]¶
- Returns:
params and saveable_param_replace resolved
- Return type:
dict[str,tf.Variable|tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject]
- classmethod get_losses(name, network, output, loss=None, reduce_func=None, layer=None, **kwargs)[source]¶
Losses will get constructed here. This gets called inside a loss name scope of the layer. When overriding this, make sure that it works both with layer set and unset.
- Parameters:
name (str) – layer name
network (returnn.tf.network.TFNetwork)
loss (Loss|None) – argument just as for __init__
output (Data) – the output (template) for the layer
layer (LayerBase|None) – The real layer instance, if it exists at the current point. If not given, init() must be called at a later point.
reduce_func (((tf.Tensor)->tf.Tensor)|None) – if given, will overwrite the reduce func for the loss. By default, every loss_value and error_value is a scalar (sum or average over the batches, and over the frames for frame-wise losses). However, if you provide reduce_func = returnn.tf.util.basic.identity, you can get the unreduced tensor.
kwargs – all the remaining __init__ args
- Returns:
the losses defined by this layer
- Return type:
- get_losses_initialized(reduce_func=None)[source]¶
As self.get_losses, but here we return them all initialized (i.e. the layer is set). You should not override this method but rather
get_losses()
.- Parameters:
reduce_func (((tf.Tensor)->tf.Tensor)|None) – as in get_losses
- Returns:
the losses defined by this layer
- Return type:
- get_output_spatial_smoothing_energy()[source]¶
- Returns:
scalar. see
returnn.tf.util.basic.spatial_smoothing_energy()
- Return type:
tf.Tensor
- get_darc1()[source]¶
DARC1, simplified Directly Approximately Regularizing Complexity (DARC), via Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
- Returns:
scalar
- Return type:
tf.Tensor
- batch_norm(data, use_shift=True, use_std=True, use_sample=0.0, force_sample=False, momentum=<class 'returnn.util.basic.NotSpecified'>, epsilon=0.001, update_sample_only_in_training=<class 'returnn.util.basic.NotSpecified'>, delay_sample_update=<class 'returnn.util.basic.NotSpecified'>, param_version=<class 'returnn.util.basic.NotSpecified'>, gamma_init=1.0, beta_init=0.0, masked_time=<class 'returnn.util.basic.NotSpecified'>)[source]¶
- Parameters:
data (Data)
use_shift (bool)
use_std (bool)
use_sample (float) – defaults to 0.0 which is used in training
force_sample (bool) – even in eval, use the use_sample factor
momentum (float) – for the running average of sample_mean and sample_std
update_sample_only_in_training (bool)
delay_sample_update (bool)
param_version (int) – 0 or 1 or 2
epsilon (float)
gamma_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the scalebeta_init (str|float) – see
returnn.tf.util.basic.get_initializer()
, for the meanmasked_time (bool) – flatten and mask input tensor
- Return type:
tf.Tensor
https://arxiv.org/abs/1502.03167
With our default settings:
In training: use_sample=0, i.e. not using running average, using current batch mean/var.
Not in training (e.g. eval): use_sample=1, i.e. using running average, not using current batch mean/var.
The running average includes the statistics of the current batch.
The running average is also updated when not training.
- Also see:
tf.nn.batch_normalization() https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/batch_norm.py
If this is a recurrent layer, this would return the hidden state. This is used e.g. for the RnnCellLayer class.
- Return type:
tf.Tensor | list[tf.Tensor] | None
- Returns:
optional tensor(s) with shape (time, batch, dim)
If this is a recurrent layer, this would return the last hidden state. Otherwise, we return None.
- Parameters:
key (int|str|None) – also the special key “*”
- Return type:
tf.Tensor | None
- Returns:
optional tensor with shape (batch, dim)
- post_process_final_rec_vars_outputs(rec_vars_outputs, seq_len)[source]¶
- Parameters:
rec_vars_outputs (dict[str,tf.Tensor])
seq_len (tf.Tensor) – shape (batch,)
- Return type:
dict[str,tf.Tensor]
- classmethod get_rec_initial_output(batch_dim, name, output, rec_layer, initial_output=None, **kwargs)[source]¶
If this layer is used inside a recurrent layer, this function specifies the output of frame t=-1, if it is needed. As arguments, we get the usual layer arguments. batch_dim is added because it might be special because of beam search.
Note: This could maybe share code with
RnnCellLayer.get_rec_initial_state()
.- Parameters:
batch_dim (tf.Tensor) – including beam size in beam search
name (str) – layer name
output (Data) – template
rec_layer (returnn.tf.layers.rec.RecLayer)
initial_output (str|float|int|tf.Tensor|None)
- Return type:
tf.Tensor
- classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, **kwargs)[source]¶
- Parameters:
batch_dim (tf.Tensor) – for this layer, might be with beam
rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase|None) – for the scope
- Return type:
dict[str,tf.Tensor]
- classmethod get_rec_initial_extra_outputs_shape_invariants(rec_layer, **kwargs)[source]¶
- Parameters:
rec_layer (returnn.tf.layers.rec.RecLayer|LayerBase|None) – for the scope
- Returns:
optional shapes for the tensors by get_rec_initial_extra_outputs
- Return type:
dict[str,tf.TensorShape]
- class returnn.tf.layers.base.InternalLayer(output: Tensor, debug_type_name: str | None = None, **kwargs)[source]¶
This is not supposed to be used by the user. It is used by some code to construct a wrapper layer or so.
- Parameters:
output
debug_type_name – just for repr
- classmethod transform_config_dict(d, network, get_layer)[source]¶
- Parameters:
d (dict[str])
network (returnn.tf.network.TFNetwork)
get_layer
- class returnn.tf.layers.base.DataNotAvailableLayer(layer_class, layer_desc, **kwargs)[source]¶
This is a dummy layer that is created when the output template is flagged “not available for inference”. The output template should be passed to the constructor to correctly forward the information in case any dependent output is exported with “register_as_extern_data”.
See
returnn.tf.network._create_layer()
- Parameters:
layer_class (type[LayerBase])
layer_desc (dict[str])
- class returnn.tf.layers.base.WrappedInternalLayer(base_layer, sources=None, **kwargs)[source]¶
This is not supposed to be used by the user. Like
InternalLayer
, only intended for internal usage. This layer is supposed to logically wrap another layer.- Parameters:
- class returnn.tf.layers.base.ReuseParams(reuse_layer=None, map=None, custom=None, auto_create_missing=False, layer_output=None, shape=None)[source]¶
This is for parameter sharing, i.e. reusing existing tf.Variable objects in a new layer, instead of creating new variables.
ReuseParams.from_config_dict()
will be called viaLayerBase.transform_config_dict()
.- Parameters:
reuse_layer (LayerBase|ReuseParams.LazyLayerResolver|None)
map (dict[str,ReuseParams]|None)
custom ((**kwargs)->(tf.Tensor|tf.Variable)) –
see
self.variable_custom_getter()
auto_create_missing (bool)
layer_output (LayerBase|None)
shape (tuple[Dim]|None)
- classmethod from_config_dict(opts, network, get_layer)[source]¶
This will be called via
LayerBase.transform_config_dict()
on the layer option “reuse_params”.- Parameters:
opts (str|dict[str]|None) –
If None, we will return None. If str, it will be interpret as a layer name. If dict, you can specify:
”reuse_layer”: layer name “map”: dict where the keys are parameter names, and the values can be:
A str would be interpret as a layer name. None would be interpret as the option auto_create_missing. A dict would specify
ReuseParams.__init__()
options.The option reuse_layer would be specified as a str, and represents a layer name.
network (returnn.tf.network.TFNetwork)
get_layer (((str) -> LayerBase)) – function to get or construct another layer
- Return type:
ReuseParams|None
- class LazyLayerResolver(layer_name, network, get_layer)[source]¶
Unfortunately this is a bit tricky and difficult to do right. We want to support it because it can happen that e.g. in training, this is layer resolving is not needed, and then in search, it is needed, due to different dependencies. See
test_reuse_params_map_custom_dep_loop()
for an example. The params depend on a layer which is not constructed yet and cannot be constructed yet because of a dependency loop. Thus, here we again try to create it, and if we still get the dependency loop, we create the reused-params-layer based on dummy inputs, such that the variables/parameters get created and can be used now. Then, later, we are going to recreate the reused-params-layer.- Parameters:
layer_name (str)
network (returnn.tf.network.TFNetwork)
get_layer (((str) -> LayerBase))
- get_variable_scope(base_layer, **kwargs)[source]¶
- Parameters:
base_layer (LayerBase)
kwargs – passed to tf.compat.v1.variable_scope
- Return type:
tf.compat.v1.VariableScope
- variable_custom_getter(base_layer, name, shape, dtype, getter, **kwargs)[source]¶
By TF docs, from
_VariableStore.get_variable()
: Callable that takes as a first argument the true getter, and allows overwriting the internal get_variable method. The signature of custom_getter should match that of this method, but the most future-proof version will allow for changes: def custom_getter(getter, *args, **kwargs). Direct access to all get_variable parameters is also allowed: def custom_getter(getter, name, *args, **kwargs). A simple identity custom getter that simply creates variables with modified names is: ```python def custom_getter(getter, name, *args, **kwargs):``` In addition, we get the argument base_scope_name, via
self.get_variable_scope()
.- Parameters:
base_layer (LayerBase) – we expect that this is the prefix of
name
name (str) – absolute param name
shape (tuple[int]|list[int])
dtype (tensorflow.DType)
getter ((...)->tf.Variable)
- Return type:
tf.Variable|tf.Tensor
- class returnn.tf.layers.base.SearchChoices(owner, beam_size, is_decided=False, keep_raw=False)[source]¶
In beam search, after expanding the beam and then selecting the N best (beam) (see
ChoiceLayer
), when doing this multiple times, we need to keep reference where each beam came from, and what the current score is, etc. Also we could have multiple different such expansions & prunes via differentChoiceLayer
. This is what we keep track here.- Parameters:
owner (LayerBase)
beam_size (int)
is_decided (bool) – by
DecideLayer
keep_raw (bool) – by
DecideKeepBeamLayer
- set_beam_from_own_rec()[source]¶
Assumes we have set self.owner, and uses those rec vars to set the beam scores.
- set_beam_from_rec(rev_vars_outputs)[source]¶
- Parameters:
rev_vars_outputs (dict[str,tf.Tensor]) – e.g. via
ChoiceLayer
- set_src_beams(src_beam_idxs)[source]¶
- Parameters:
src_beam_idxs (tf.Tensor) – source beam index, (batch, beam)
- get_src_choices_seq()[source]¶
- Returns:
all SearchChoices we depend on up to the root, including and starting with self
- Return type:
list[SearchChoices]
- static compare(self, other)[source]¶
Also see
TFNetwork.get_search_choices.compare_layer()
, which is basically the same.- Parameters:
self (SearchChoices|None)
other (SearchChoices|None)
- Returns:
0 if equal, -1 if we are smaller, else 1
- Return type:
int
- class returnn.tf.layers.base.Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, custom_inv_norm_factor=None, scale=1.0, _check_output_before_softmax=None)[source]¶
Base class for all losses.
- Parameters:
base_network (returnn.tf.network.TFNetwork)
use_flatten_frames (bool) – will use
returnn.tf.util.basic.flatten_with_seq_len_mask()
use_normalized_loss (bool) – the loss used in optimization will be normalized
custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See
Loss.init()
for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).
scale (float) – additional scale factor for the loss
_check_output_before_softmax (bool|None)
- reduce_func(loss)[source]¶
Reduces the frames. Currently the sum, and we do averaging later. We might change this logic at some point. Also, some code overwrites this function externally, e.g. with returnn.tf.util.basic.identity, to not do reducing.
- Parameters:
loss (tf.Tensor) – e.g. (batch*time,), or (time_flat,), or (batch*time,dim), etc
- Returns:
by default just a scalar. but this can be overwritten, to not reduce
- Return type:
tf.Tensor
- reduce_to_batch(loss, normalize)[source]¶
- Parameters:
loss (tf.Tensor) – e.g. (batch*time,), or (time_flat,), or (batch*time,dim), etc
normalize (bool) – reduce mean instead of reduce sum
- Returns:
(batch,)
- Return type:
tf.Tensor
- classmethod transform_config_dict(d, network, get_layer)[source]¶
- Parameters:
d (dict[str]) – will modify inplace, the loss_opts
network (returnn.tf.network.TFNetwork)
get_layer (((str) -> LayerBase)) – function to get or construct another layer
Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by LayerBase.transform_config_dict.
- init_by_layer(layer, layer_output_template=None)[source]¶
- Parameters:
layer (LayerBase|None)
layer_output_template (Data|None) – maybe alternative template
- init(output, output_with_activation=None, target=None, layer=None)[source]¶
- Parameters:
output (Data) – generated output
output_with_activation (OutputWithActivation|None)
target (Data) – reference target from dataset
layer (LayerBase|None)
- get_error()[source]¶
- Returns:
frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)
- Return type:
tf.Tensor
- get_value()[source]¶
- Returns:
self.reduce_func(loss), which is usually a scalar with the default as if does tf.reduce_sum. float32 value. it should not be normalized over frames, as this will be calculated in
TFEngine.Runner._collect_eval_info()
.- Return type:
tf.Tensor|None
- get_normalization_factor()[source]¶
- Returns:
factor as a float scalar, usually 1.0 / num_frames. see self.reduce_func.
- Return type:
tf.Tensor