# TFNetworkLayer¶

class TFNetworkLayer.LayerBase(name, network, output=None, n_out=None, out_type=None, sources=(), target=None, loss=None, loss_scale=1.0, size_target=None, reuse_params=None, L2=None, darc1=None, is_output_layer=None, only_on_eval=False, only_on_search=False, copy_output_loss_from_source_idx=None, batch_norm=False, spatial_smoothing=0.0, initial_output=None, rec_previous_layer=None, trainable=True, custom_param_importer=None, register_as_extern_data=None)[source]

This is the base class for all layers. Every layer by default has a list of source layers sources and defines self.output which is of type Data. It shares some common functionality across all layers, such as explicitly defining the output format, some parameter regularization, and more.

If you want to implement your own layer:

class YourOwnLayer(_ConcatInputLayer):  # e.g. either _ConcatInputLayer or LayerBase as a base
" some docstring "
layer_class = "your_layer_name"

def __init__(self, your_kwarg1, your_opt_kwarg2=None, **kwargs):
" docstring, document the args! "
super(YourOwnLayer, self).__init__(**kwargs)
# Now we need to set self.output, which must be of type :class:Data.
# It is set at this point to whatever we got from selfget_out_data_from_opts(),
# so it is enough if we set self.output.placeholder and self.output.size_placeholder,
# but we could also reset self.output.
self.output.placeholder = self.input_data.placeholder + 42  # whatever you want to do
# If you don't modify the sizes (e.g. sequence-length), just copy the input sizes.
self.output.size_placeholder = self.input_data.size_placeholder.copy()

@classmethod
def get_out_data_from_opts(cls, **kwargs):
" This is supposed to return a :class:Data instance as a template, given the arguments. "
# example, just the same as the input:
return get_concat_sources_data_template(kwargs["sources"], name="%s_output" % kwargs["name"])


Usually the arguments, when specified in the network dict, are going through transform_config_dict(), before they are passed to here. See TFNetwork.construct_from_dict().

Parameters: name (str) – network (TFNetwork.TFNetwork) – output (Data) – n_out (None|int) – output dim out_type (dict[str]) – kwargs for Data class. more explicit than n_out. sources (list[LayerBase]) – via self.transform_config_dict() target (str|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target) alternatively, this also can be a layer name. size_target (str|None) – like target but this is only used to set our output size in case of training loss (Loss|None) – via transform_config_dict(). Every layer can have one loss (of type Loss), or none loss. In the net dict, it is specified as a string. In TFNetwork, all losses from all layers will be collected. That is what TFUpdater.Updater will use for training. loss_scale (float) – scale factor for loss (1.0 by default). DEPRECATED: use loss.scale instead. reuse_params (ReuseParams|None) – if given, will opt reuse the params. see self.var_creation_scope() L2 (float|None) – for constraints darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468 is_output_layer (bool|None) – only_on_eval (bool) – if True, this layer will only be calculated in eval only_on_search (bool) – if True, this layer will only be calculated when search is done copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source batch_norm (bool|dict) – see self.batch_norm() initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output() rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us trainable (bool) – whether the parameters of this layer will be trained custom_param_importer (str|callable|None) – used by set_param_values_by_dict() register_as_extern_data (str|None) –
layer_class = None[source]
recurrent = False[source]
saveable_param_replace = None[source]
Type: dict[tf.Variable,tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject|None]
post_init(layer_desc)[source]

This gets called right after self.__init__().

Parameters: layer_desc (dict[str]) – kwargs as they are passed to self.__init__
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
classmethod cls_get_tf_scope_name(name)[source]
Parameters: name (str) – layer name valid scope name, might be just name. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX str
classmethod cls_layer_scope(name)[source]

Setup scope for layer. This can also be used when the layer does not yet exists. This is supposed to cover variable creations as well. Currently vars might be created when used within the rec-layer, but they are caught in a more generic way there, so we have not implemented yet any special logic here.

Parameters: name (str) – layer name context manager object
classmethod get_global_layer_list()[source]
Return type: list[LayerBase]
classmethod get_recent_layer()[source]
Return type: LayerBase
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

tf_scope_name[source]
get_base_absolute_name_scope_prefix()[source]
Returns: e.g. “output/”, always with “/” at end str
get_absolute_name_scope_prefix()[source]
Returns: e.g. “output/”, always with “/” at end str
is_output_layer()[source]

Some code differs between an output layer and other layers. It is a bit arbitrary what we define as output layer. :rtype: bool

get_dep_layers()[source]
Returns: list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc. list[LayerBase]
get_search_choices()[source]
Return type: SearchChoices|None
get_search_beam_size()[source]
Returns: beam size if there was a choice layer and we do search int|None
get_batch_dim()[source]

The batch dim by this layer, not taken from our output but calculated. Normally it is self.network.get_batch_dim() but if we do search and there was a choice layer, it it multiplied by the beam size. :return: batch dim * beam size :rtype: tf.Tensor

var_creation_scope(**kwargs)[source]

This takes care of setting up a scope where variables can be created.

Parameters: kwargs – passed to variable_scope yields the variable_scope
add_param(param, custom_update=None, trainable=None, saveable=None, axes_split_info=None)[source]
Parameters: param (tf.Variable|tf.Tensor) – custom_update (None|CustomUpdate) – will be applied in training, instead of taking the gradient trainable (bool|None) – saveable (bool|None) – axes_split_info (list[list[int]]|None) – e.g. [[n],[n]*4] for LSTM matrices param

:rtype tf.Variable

set_param_values_by_dict(values_dict, session, ignore_wrong_shape=False, copy_param_mode=None)[source]
Parameters: values_dict (dict[str,numpy.ndarray]) – ignore_wrong_shape (bool) – copy_param_mode (str|None) – session (tf.Session) –
get_param_values_dict(session)[source]
Parameters: session (tf.Session) – dict name -> values dict[str,numpy.ndarray]
get_saveable_params_dict()[source]
Returns: params and saveable_param_replace resolved dict[str,tf.Variable|tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject]
classmethod get_losses(name, network, output, loss=None, reduce_func=None, layer=None, **kwargs)[source]

Losses will get constructed here. This gets called inside a loss name scope of the layer. When overriding this, make sure that it works both with layer set and unset.

Parameters: name (str) – layer name network (TFNetwork.TFNetwork) – loss (Loss|None) – argument just as for __init__ output (Data) – the output (template) for the layer layer (LayerBase|None) – The real layer instance, if it exists at the current point. If not given, init() must be called at a later point. reduce_func (((tf.Tensor)->tf.Tensor)|None) – if given, will overwrite the reduce func for the loss. By default, every loss_value and error_value is a scalar (sum or average over the batches, and over the frames for frame-wise losses). However, if you provide reduce_func = TFUtil.identity, you can get the unreduced tensor. kwargs – all the remaining __init__ args the losses defined by this layer list[TFNetwork.LossHolder]
get_losses_initialized(reduce_func=None)[source]

As self.get_losses, but here we return them all initialized. You should not override this method but rather get_losses().

Parameters: reduce_func (((tf.Tensor)->tf.Tensor)|None) – as in get_losses the losses defined by this layer list[TFNetwork.LossHolder]
get_params_l2_norm()[source]
Returns: scalar tf.Tensor
get_output_spatial_smoothing_energy()[source]
Returns: scalar. see TFUtil.spatial_smoothing_energy() tf.Tensor
get_darc1()[source]

DARC1, simplified Directly Approximately Regularizing Complexity (DARC), via Generalization in Deep Learning, https://arxiv.org/abs/1710.05468

Returns: scalar tf.Tensor
get_constraints_value()[source]
Returns: None or scalar tf.Tensor|None
batch_norm(data, use_shift=True, use_std=True, use_sample=0.0, force_sample=False, momentum=0.99, epsilon=0.001, sample_mean=None, sample_variance=None, gamma=None, beta=None)[source]
Parameters: data (Data) – use_shift (bool) – use_std (bool) – use_sample (float) – defaults to 0.0 which is used in training force_sample (bool) – even in eval, use the use_sample factor momentum (float) – for the running average of sample_mean and sample_std epsilon (float) – sample_mean (tf.Tensor) – sample_variance (tf.Tensor) – gamma (tf.Tensor) – beta (tf.Tensor) – tf.Tensor

http://arxiv.org/abs/1502.03167

Also see:
tf.nn.batch_normalization() https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/batch_norm.py
get_hidden_state()[source]

If this is a recurrent layer, this would return the hidden state. This is used e.g. for the RnnCellLayer class. :rtype: tf.Tensor | list[tf.Tensor] | None :return: optional tensor(s) with shape (time, batch, dim)

get_last_hidden_state(key)[source]

If this is a recurrent layer, this would return the last hidden state. Otherwise, we return None. :param int|str|None key: also the special key “*” :rtype: tf.Tensor | None :return: optional tensor with shape (batch, dim)

classmethod get_rec_initial_output(batch_dim, name, output, rec_layer, initial_output=None, **kwargs)[source]

If this layer is used inside a recurrent layer, this function specifies the output of frame t=-1, if it is needed. As arguments, we get the usual layer arguments. batch_dim is added because it might be special because of beam search.

Note: This could maybe share code with RnnCellLayer.get_rec_initial_state(). We could also add support to make the initial output be the output of another layer.

Parameters: batch_dim (tf.Tensor) – including beam size in beam search name (str) – layer name output (Data) – template rec_layer (TFNetworkRecLayer.RecLayer) – initial_output (str|float|int|tf.Tensor|None) – tf.Tensor
classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, **kwargs)[source]
Parameters: batch_dim (tf.Tensor) – for this layer, might be with beam rec_layer (TFNetworkRecLayer.RecLayer) – dict[str,tf.Tensor]
classmethod get_rec_initial_extra_outputs_shape_invariants(**kwargs)[source]
Returns: optional shapes for the tensors by get_rec_initial_extra_outputs dict[str,tf.TensorShape]
class TFNetworkLayer.ReuseParams(reuse_layer=None, map=None, custom=None, auto_create_missing=False)[source]
Parameters: reuse_layer (LayerBase|()->LayerBase|None) – map (dict[str,ReuseParams]|None) – custom ((**kwargs)->(tf.Tensor|tf.Variable)) – see self.variable_custom_getter() auto_create_missing (bool) –
classmethod from_config_dict(opts, network, get_layer)[source]
Parameters: opts (str|dict|None) – network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer ReuseParams|None
class LazyLayerResolver(layer_name, network, get_layer)[source]

Unfortunately this is a bit tricky and difficult to do right. We want to support it because it can happen that e.g. in training, this is layer resolving is not needed, and then in search, it is needed, due to different dependencies. See test_reuse_params_map_custom_dep_loop() for an example. The params depend on a layer which is not constructed yet and cannot be constructed yet because of a dependency loop. Thus, here we again try to create it, and if we still get the dependency loop, we create the reused-params-layer based on dummy inputs, such that the variables/parameters get created and can be used now. Then, later, we are going to recreate the reused-params-layer.

Parameters: layer_name (str) – network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) –
get_layer()[source]
create_dummy_layer(dep_loop_exception)[source]
Parameters: dep_loop_exception (TFNetwork.NetworkConstructionDependencyLoopException) – LayerBase
reuse_layer[source]
Return type: LayerBase|None
get_base_absolute_name_scope_prefix(base_layer, param)[source]
Parameters: base_layer (LayerBase) – param (tf.Variable) – e.g. “base_layer/rec/W” e.g. “base_layer/” (not “base_layer/rec/”), always with “/” at end str
get_variable_scope(base_layer, **kwargs)[source]
Parameters: base_layer (LayerBase) – kwargs – passed to tf.variable_scope tf.VariableScope
variable_custom_getter(getter, name, base_layer, **kwargs)[source]

By TF docs, from _VariableStore.get_variable(): Callable that takes as a first argument the true getter, and allows overwriting the internal get_variable method. The signature of custom_getter should match that of this method, but the most future-proof version will allow for changes: def custom_getter(getter, *args, **kwargs). Direct access to all get_variable parameters is also allowed: def custom_getter(getter, name, *args, **kwargs). A simple identity custom getter that simply creates variables with modified names is: python def custom_getter(getter, name, *args, **kwargs):

return getter(name + ‘_suffix’, *args, **kwargs)

 In addition, we get the argument base_scope_name, via self.get_variable_scope().

Parameters: getter ((..)->tf.Variable) – name (str) – absolute name base_layer (LayerBase) – we expect that this is the prefix of name tf.Variable|tf.Tensor
class TFNetworkLayer.SearchChoices(owner, src_beams=None, beam_size=None, is_decided=False)[source]
Parameters: owner (LayerBase) – src_beams (tf.Tensor|None) – (batch, beam) -> src beam index beam_size (int|None) – is_decided (bool) – by decide layer
src_layer[source]
Returns: The layer where we had the last search choices. LayerBase
set_beam_scores_from_own_rec()[source]
set_beam_scores_from_rec(rev_vars_outputs)[source]
Parameters: rev_vars_outputs (dict[str,tf.Tensor]) –
set_beam_scores(scores)[source]
Parameters: scores (tf.Tensor) – (batch, beam) -> log score
get_all_src_choices()[source]
Returns: all SearchChoices we depend on up to the root, including self list[SearchChoices]
static compare(self, other)[source]
Parameters: self (SearchChoices|None) – other (SearchChoices|None) – 0 if equal, -1 if we are smaller, else 1 int
translate_to_this_search_beam(sources)[source]
Parameters: sources (LayerBase|list[LayerBase]|dict[str,LayerBase|object]|tuple[LayerBase|object]|T) – sources but all layers transformed when needed T
classmethod translate_to_common_search_beam(sources)[source]
Parameters: sources (list[LayerBase]|dict[str,LayerBase|object]) – sources but all layers transformed when needed list[LayerBase]|dict[str,LayerBase|object]
class TFNetworkLayer.SourceLayer(network, data_key=None, sources=(), **kwargs)[source]
Parameters: network (TFNetwork.TFNetwork) – data_key (str|None) – sources (tuple) –
layer_class = 'source'[source]
classmethod get_out_data_from_opts(network, data_key=None, **kwargs)[source]
Parameters: network (TFNetwork.TFNetwork) – data_key (str|None) – Data
TFNetworkLayer.concat_sources(src_layers)[source]
Parameters: src_layers (list[LayerBase]) – data with placeholders set Data
TFNetworkLayer.get_concat_sources_data_template(src_layers, name=None)[source]

This just creates a template Data instance, without creating any real TF tensors. conact_sources() (and related) are the equivalent functions which would create a Data together with the tensor.

Parameters: src_layers (list[LayerBase]) – name (str|None) – name of the Data data with no placeholders set. it is always a copy or new instance, so safe to manipulate Data
TFNetworkLayer.concat_sources_with_opt_dropout(src_layers, dropout=0, dropout_noise_shape=None)[source]
Parameters: src_layers (list[LayerBase]) – dropout (float) – will be applied if train_flag is set dropout_noise_shape (tuple|list|dict|None) – data with placeholders set Data
class TFNetworkLayer.CopyLayer(**kwargs)[source]

This layer does nothing, it copies its input. If multiple sources are provided, they are concatenated in the feature-dim.

layer_class = 'copy'[source]
classmethod get_out_data_from_opts(name, sources=(), out_type=None, n_out=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.DropoutLayer(**kwargs)[source]

Just the same as CopyLayer, because that one already supports dropout.

layer_class = 'dropout'[source]
class TFNetworkLayer.InternalLayer(name, network, output=None, n_out=None, out_type=None, sources=(), target=None, loss=None, loss_scale=1.0, size_target=None, reuse_params=None, L2=None, darc1=None, is_output_layer=None, only_on_eval=False, only_on_search=False, copy_output_loss_from_source_idx=None, batch_norm=False, spatial_smoothing=0.0, initial_output=None, rec_previous_layer=None, trainable=True, custom_param_importer=None, register_as_extern_data=None)[source]

This is not supposed to be used by the user. It is used by some code to construct a wrapper layer or so.

Usually the arguments, when specified in the network dict, are going through transform_config_dict(), before they are passed to here. See TFNetwork.construct_from_dict().

Parameters: name (str) – network (TFNetwork.TFNetwork) – output (Data) – n_out (None|int) – output dim out_type (dict[str]) – kwargs for Data class. more explicit than n_out. sources (list[LayerBase]) – via self.transform_config_dict() target (str|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target) alternatively, this also can be a layer name. size_target (str|None) – like target but this is only used to set our output size in case of training loss (Loss|None) – via transform_config_dict(). Every layer can have one loss (of type Loss), or none loss. In the net dict, it is specified as a string. In TFNetwork, all losses from all layers will be collected. That is what TFUpdater.Updater will use for training. loss_scale (float) – scale factor for loss (1.0 by default). DEPRECATED: use loss.scale instead. reuse_params (ReuseParams|None) – if given, will opt reuse the params. see self.var_creation_scope() L2 (float|None) – for constraints darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468 is_output_layer (bool|None) – only_on_eval (bool) – if True, this layer will only be calculated in eval only_on_search (bool) – if True, this layer will only be calculated when search is done copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source batch_norm (bool|dict) – see self.batch_norm() initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output() rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us trainable (bool) – whether the parameters of this layer will be trained custom_param_importer (str|callable|None) – used by set_param_values_by_dict() register_as_extern_data (str|None) –
class TFNetworkLayer.SelectSearchSourcesLayer(search_choices, **kwargs)[source]

Selects the corresponding search beams from the source, given current search choices (determined by a layer). Like InternalLayer, only for internal purpose at the moment.

Parameters: search_choices (LayerBase) –
get_dep_layers()[source]
Returns: list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc. list[LayerBase]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(name, sources, search_choices, **kwargs)[source]
Parameters: name (str) – sources (list[LayerBase]) – search_choices (LayerBase) – Data
class TFNetworkLayer.ActivationLayer(activation, **kwargs)[source]

This layer just applies an activation function. See TFUtil.get_activation_function() about supported functions. Also see EvalLayer and CombineLayer for similar layers.

Parameters: activation (str) – e.g. “relu”, “tanh”, etc
layer_class = 'activation'[source]
class TFNetworkLayer.BatchNormLayer(**kwargs)[source]

Implements batch-normalization (http://arxiv.org/abs/1502.03167) as a separate layer.

All kwargs which are present in our base class are passed to our base class. All remaining kwargs are used for self.batch_norm().

layer_class = 'batch_norm'[source]
class TFNetworkLayer.LayerNormLayer(epsilon=1e-06, **kwargs)[source]

Applies layer-normalization.

layer_class = 'layer_norm'[source]
classmethod get_out_data_from_opts(sources, name, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SliceLayer(axis, slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]

Slicing on the input, i.e. x[start:end:step] in some axis. See also SliceNdLayer.

Parameters: axis (int|str) – axis_kind (str|None) – “T” for time, “B” for batch, “F” for feature slice_start (int|None) – slice_end (int|None) – slice_step (int|None) –
layer_class = 'slice'[source]
classmethod get_out_data_from_opts(name, axis, sources=(), slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SliceNdLayer(start, size, **kwargs)[source]

This takes out a slice-range from some axis, e.g. x[start:start + size]. This layers allows a different start slice point for each batch, in contrast to SliceLayer, and the start is variable.

Parameters: start (LayerBase) – size (int) –
layer_class = 'slice_nd'[source]
classmethod get_out_data_from_opts(name, sources=(), start=None, size=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

class TFNetworkLayer.LinearLayer(activation, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, **kwargs)[source]

Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function. See also DotLayer, ElemwiseProdLayer, WeightedSumLayer.

Parameters: activation (str|None) – e.g. “relu”, or None with_bias (bool) – grad_filter (float|None) – if grad norm is higher than this threshold (before activation), the grad is removed forward_weights_init (str) – see TFUtil.get_initializer() recurrent_weights_init (str) – see TFUtil.get_initializer() bias_init (str|float) – see TFUtil.get_initializer()
layer_class = 'linear'[source]
class TFNetworkLayer.SoftmaxLayer(activation='softmax', **kwargs)[source]

Just a LinearLayer with activation=”softmax” by default.

layer_class = 'softmax'[source]
class TFNetworkLayer.LengthLayer(add_time_axis=False, dtype='int32', **kwargs)[source]

Returns the length of sources as (B,), via input size_placeholder.

layer_class = 'length'[source]
classmethod get_out_data_from_opts(name, sources, add_time_axis=False, dtype='int32', **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SoftmaxOverSpatialLayer(energy_factor=None, window_start=None, window_size=10, **kwargs)[source]

This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to SoftmaxLayer, this will not do a linear transformation.

Parameters: energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim). window_start (LayerBase|None) – Tensor of shape (B,) indicating the window start window_size (int) –
layer_class = 'softmax_over_spatial'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

class TFNetworkLayer.SeqLenMaskLayer(seq_len_source, axis, mask_value, **kwargs)[source]

Parameters: seq_len_source (LayerBase) – axis (str|int) – mask_value (float) –
layer_class = 'seq_len_mask'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.BatchSoftmaxLayer(**kwargs)[source]

Softmax over spacial and feature axis

layer_class = 'batch_softmax'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ConstantLayer(sources, value=0, dtype=None, **kwargs)[source]

Output is a constant value.

layer_class = 'constant'[source]
classmethod get_out_data_from_opts(name, dtype='float32', **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.GatingLayer(activation, gate_activation='sigmoid', **kwargs)[source]

Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.

layer_class = 'gating'[source]
classmethod get_out_data_from_opts(name, sources, n_out=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.WindowLayer(window_size, window_left=None, window_right=None, axis='T', padding='same', **kwargs)[source]

Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. Will always return as batch major mode. E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}.

This is not to take out a window from the time-dimension. See SliceLayer or SliceNdLayer.

Parameters: window_size (int) – window_left (int|None) – window_right (int|None) – axis (str|int) – see Data.get_axis_from_description() padding (str) – “same” or “valid” kwargs –
layer_class = 'window'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(window_size, axis='T', sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, window_size, axis='T', sources=(), **kwargs)[source]
Parameters: batch_dim (tf.Tensor) – for this layer, might be with beam rec_layer (TFNetworkRecLayer.RecLayer) – dict[str,tf.Tensor]
class TFNetworkLayer.CumsumLayer(axis='T', additional_left_summand_per_element=None, **kwargs)[source]

Basically wraps tf.cumsum. Also supports that in the RecLayer.

Parameters: axis (str) – see Data.get_axis_from_description() additional_left_summand_per_element (str|int|float|None) – the order matters for tf.string
layer_class = 'cumsum'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, axis='T', **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, axis='T', sources=(), **kwargs)[source]
Parameters: batch_dim (tf.Tensor) – for this layer, might be with beam rec_layer (TFNetworkRecLayer.RecLayer) – dict[str,tf.Tensor]
class TFNetworkLayer.PadLayer(axes, padding, value=None, mode='constant', **kwargs)[source]

Parameters: axes (str|list[str]) – e.g. “F” etc. see Dataset.get_axes_from_description(). padding (list[(int,int)]|(int,int)|int) – how much to pad left/right in each axis value (int|float) – what constant value to pad, with mode==”constant” mode (str) – “constant”, “reflect” or “symmetric”
layer_class = 'pad'[source]
classmethod get_out_data_from_opts(name, axes, padding, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.MergeDimsLayer(axes, n_out=None, **kwargs)[source]

Merges a list of axes into a single one. E.g. input is (batch, width, height, dim) and axes=(1,2), then we get (batch, width*height, dim). Or input is (batch, time, height, dim) and axes=”except_time”, then we get (batch, time, height*dim). See also CombineDimsLayer. When batch and time got merged, SplitBatchTimeLayer can undo this.

Parameters: axes (str|list[str]|list[int]) – see Data.get_axes_from_description(), e.g. “except_time” n_out (int|None) –
layer_class = 'merge_dims'[source]
classmethod get_out_data_from_opts(name, axes, sources=(), n_out=None, out_type=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SplitDimsLayer(axis, dims, **kwargs)[source]

Splits one axis into multiple axes. E.g. if you know that your feature-dim is composed by a window, i.e. the input is (batch, time, window * feature), you can set axis=”F”, dims=(window, -1), and you will get the output (batch, time, window, feature). Also see SplitBatchTimeLayer.

Parameters: axis (str) – e.g. “F” dims (tuple[int]) – what the axis should be split into. e.g. (window, -1)
layer_class = 'split_dims'[source]
classmethod get_out_data_from_opts(name, axis, dims, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SplitBatchTimeLayer(base, **kwargs)[source]

A very specific layer which expects to get input of shape (batch * time, …) and converts it into (batch, time, …), where it recovers the seq-lens from some other layer. See SplitDimsLayer for a more generic layer.

Parameters: base (LayerBase) – used to recover the seq-lens
layer_class = 'split_batch_time'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(name, base, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ExpandDimsLayer(axis, dim=1, **kwargs)[source]

Parameters: axis (str|int) – axis to add, e.g. “F”|”feature” or “spatial”. if this is an integer, the input data is first converted into batch-major mode, and then this is counted with batch-dim. dim (int) – dimension of new axis (1 by default)
layer_class = 'expand_dims'[source]
classmethod get_out_data_from_opts(name, axis, dim=1, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SwapAxesLayer(axis1, axis2, **kwargs)[source]

Swaps two axes. Basically a wrapper around TFUtil.swapaxes(). See also ReinterpretDataLayer.

Parameters: axis1 (int|str) – axis2 (int|str) –
layer_class = 'swap_axes'[source]
classmethod get_out_data_from_opts(name, sources, axis1, axis2, **kwargs)[source]
Parameters: name (str) – sources (list[LayerBase]) – axis1 (int|str) – axis2 (int|str) – Data
class TFNetworkLayer.ReinterpretDataLayer(switch_axes=None, size_base=None, set_axes=None, enforce_batch_major=False, enforce_time_major=False, increase_sparse_dim=None, **kwargs)[source]

Acts like the CopyLayer but reinterprets the role of some axes or data.

Parameters: switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes size_base (LayerBase|None) – set_axes (dict[str,int]) – enforce_batch_major (bool) – enforce_time_major (bool) –
layer_class = 'reinterpret_data'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(name, sources, switch_axes=None, size_base=None, set_axes=None, enforce_batch_major=False, enforce_time_major=False, increase_sparse_dim=None, **kwargs)[source]
Parameters: name (str) – sources (list[LayerBase]) – switch_axes (str|list[str]) – e.g. “bt” to switch batch and time axes size_base (LayerBase|None) – set_axes (dict[str,int]) – enforce_batch_major (bool) – enforce_time_major (bool) – increase_sparse_dim (int|None) – if sparse, add this to the dim
class TFNetworkLayer.ConvLayer(n_out, filter_size, padding, strides=1, dilation_rate=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, with_bias=False, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, **kwargs)[source]

A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.

Parameters: n_out (int) – number of outgoing features filter_size (tuple[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data. padding (str) – “same” or “valid” strides (int|tuple[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int. input_expand_dims (int) – number of dynamic dims to add to the input input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim. input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value. with_bias (bool) – if True, will add a bias to the output features activation (None|str) – if set, will apply this function at the end
layer_class = 'conv'[source]
recurrent = True[source]
classmethod calc_out_dim(in_dim, filter_size, stride, padding, dilation_rate=1)[source]
Parameters: in_dim (int|tf.Tensor|T) – dimension in some axis filter_size (int) – e.g. 2, for the corresponding axis stride (int) – e.g. 1, for the corresponding axis dilation_rate (int) – e.g. 1 padding (str) – “valid” or “same” the output dimension T
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, **kwargs)[source]

A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.

Parameters: mode (str) – “max” or “avg” pool_size (tuple[int]) – shape of the window of each reduce padding (str) – “valid” or “same” dilation_rate (tuple[int]|int) – strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
layer_class = 'pool'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, pool_size, strides=None, dilation_rate=1, sources=(), padding='VALID', **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]

This reduces some axis by using “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.

Parameters: mode (str) – “sum” or “max” or “mean” axes (int|list[int]|str) – one axis or multiple axis to reduce. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature” axis (int|list[int]|str) – for compatibility, can be used instead of axes keep_dims (bool) – if dimensions should be kept (will be 1) enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that.
layer_class = 'reduce'[source]
classmethod need_enforce_batch_dim_axis(axes)[source]
Parameters: axes (int|list[int]|str) – if any integer is in axes, thus we should have a fixed dimension layout bool
classmethod get_axes(axis, input_data)[source]
Parameters: axis – see self.__init__() input_data (Data) – list of axes list[int]
classmethod get_out_data_from_opts(name, sources, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ReduceOutLayer(mode, num_pieces, **kwargs)[source]

Combination of SplitDimsLayer applied to the feature dim and ReduceLayer applied to the resulting feature dim. This can e.g. be used to do maxout.

Parameters: mode (str) – “sum” or “max” or “mean” num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.
layer_class = 'reduce_out'[source]
classmethod get_out_data_from_opts(num_pieces, sources, name, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SqueezeLayer(axis, enforce_batch_dim_axis=0, **kwargs)[source]

Removes an axis with dimension 1. This is basically a wrapper around tf.squeeze.

Parameters: axis (int|list[int]|str) – one axis or multiple axis to squeeze. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
layer_class = 'squeeze'[source]
classmethod get_out_data_from_opts(enforce_batch_dim_axis=0, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.WeightedSumLayer(axes, padding=None, size=None, keep_dims=None, **kwargs)[source]

Calculates a weighted sum, either over a complete axis of fixed dimension, or over some window. Can also do that for multiple axes. The weights are a trainable parameter matrix. Similar would be to use ElemwiseProdLayer and ReduceLayer, or just a DotLayer with a VariableLayer. See also LinearLayer.

Parameters: axes (str|list[str]) – the axes to do the weighted-sum over padding (str) – “valid” or “same”, in case of keep_dims=True size (None|tuple[int]) – the kernel-size. if left away, the axes must be of fixed dimension, and we will use keep_dims=False, padding=”valid” by default. Otherwise, if given, you must also provide padding and keep_dims=True by default. keep_dims (bool) – if False, the axes will be squeezed away. see also size.
layer_class = 'weighted_sum'[source]
classmethod get_out_data_from_opts(name, sources, axes, padding=None, size=None, keep_dims=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ElemwiseProdLayer(axes, size=None, **kwargs)[source]

Element-wise product in some axes. Microsoft calls this “static attention”, in Deep Conv. NN with Layer-wise Context Expansion and Attention (LACE). The matrix/tensor to be used for the product are given as a trainable parameter. See also LinearLayer.

Parameters: axes (str|list[str]) – e.g. “spatial”, but all those axes must be of fixed dimension size (tuple[int]) – for double-checking, you can explicitly provide the size
layer_class = 'elemwise_prod'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.PrefixInTimeLayer(prefix=0.0, repeat=1, **kwargs)[source]
Parameters: prefix (float|str) – either some constant or another layer repeat (int) – how often to repeat the prefix
layer_class = 'prefix_in_time'[source]
class TFNetworkLayer.DotLayer(red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, debug=False, **kwargs)[source]

This performs a dot-product of two sources. The underlying matmul expects shapes (shared…, I, J) * (shared…, J, K) -> (shared…, I, K). We say that J is the axis to be reduced, I is the var-dim of source 1, and K is the var-dim of source 2. I, J, K can also be multiple axes from the sources. The var-dims don’t need to exist. All other axes (shared…) are expected to match.

Parameters: red1 (str|int|tuple[str|int]) – reduce axes of first source red2 (str|int|tuple[str|int]) – reduce axes of second source var1 (str|int|tuple[str|int]|None) – var axes of first source var2 (str|int|tuple[str|int]|None) – var axes of second source add_var2_if_empty (bool) – if var2=None, add dim=1 at the end debug (bool) – will print debug shapes, etc.
layer_class = 'dot'[source]
classmethod get_out_data_from_opts(name, sources, red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, **kwargs)[source]
Parameters: name (str) – sources (list[LayerBase]) – red1 (str|int|tuple[str|int]) – reduce axes of first source red2 (str|int|tuple[str|int]) – reduce axes of second source var1 (str|int|tuple[str|int]|None) – var axes of first source var2 (str|int|tuple[str|int]|None) – var axes of second source add_var2_if_empty (bool) – Data
class TFNetworkLayer.ShiftAxisLayer(axis, amount, pad=True, adjust_size_info=True, **kwargs)[source]

Shifts the dimensions in an axis around. This layer may change the axis-dimension.

This name might be confusing. No axis will be shifted here. See SwapAxesLayer for that.

Parameters: axis (str|int) – single axis to shift amount (int) – number of elements to shift (<0 for left-shift, >0 for right-shift) pad (bool) – preserve shape by padding adjust_size_info (bool) – whether to adjust the size_placeholder
layer_class = 'shift_axis'[source]
classmethod get_out_data_from_opts(name, amount, axis, pad, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ResizeLayer(factor, axis, kind='nn', fill_value=None, fill_dropout=None, **kwargs)[source]

Resizes the input, i.e. upsampling or downsampling. Supports different kinds, such as linear interpolation or nearest-neighbor.

Parameters: factor (int) – axis (str|int) – the axis to resize, counted with batch-dim. can also be “T” for time kind (str) – “linear”, “nn”/”nearest_neighbor”, “cubic”, “fill” fill_value (None|int|float) – if kind==”fill” fill_dropout (float) – if set, will dropout in the same axis
layer_class = 'resize'[source]
classmethod get_out_data_from_opts(factor, axis, sources, name, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.CombineDimsLayer(axes, **kwargs)[source]

Combines multiple dimensions. See also MergeDimsLayer. This is deprecated in favor of MergeDimsLayer.

Parameters: axis (int|list[int]|str) – one axis or multiple axis to reduce. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
layer_class = 'combine_dims'[source]
classmethod get_out_data_from_opts(axes, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.RemoveLayer(symbol, **kwargs)[source]

Currently, assumes sparse data, and removes a specific symbol from the data.

Parameters: symbol (int) –
layer_class = 'remove'[source]
classmethod get_out_data_from_opts(name, sources=(), **kwargs)[source]
Parameters: name (str) – sources (list[LayerBase]) – Data
class TFNetworkLayer.FsaLayer(**kwargs)[source]
layer_class = 'fsa'[source]
class TFNetworkLayer.CombineLayer(kind, sources, activation=None, with_bias=False, eval=None, eval_locals=None, eval_for_output_loss=False, **kwargs)[source]

Applies some binary operation on all sources, such as addition. Also see ActivationLayer.

Parameters: kind (str) – e.g. “average” or “add”, or “eval” sources (list[LayerBase]) – activation (str|None) – if provided, activation function to apply, e.g. “tanh” or “relu” with_bias (bool) – if given, will add a bias eval (str) – for kind=”eval”, will eval this string. see _op_kind_eval() eval_locals (dict[str]|None) – locals for eval eval_for_output_loss (bool) – will do the same eval on layer.output_loss
layer_class = 'combine'[source]
classmethod get_out_data_from_opts(n_out=None, out_type=None, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.EvalLayer(eval, **kwargs)[source]

Evaluates some string. The CombineLayer provides this functionality, thus this is just a special case of it. Also see ActivationLayer.

Parameters: eval (str) – will eval this string. see _op_kind_eval()
layer_class = 'eval'[source]
class TFNetworkLayer.CompareLayer(kind='equal', value=None, **kwargs)[source]

Compares (e.g. equality check) all the sources element-wise.

Parameters: kind (str) – e.g. “equal” value (float|int|None) – if specified, will also compare to this
layer_class = 'compare'[source]
classmethod get_out_data_from_opts(n_out=None, out_type=None, sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SwitchLayer(condition, true_from, false_from, **kwargs)[source]

Wrapper around tf.where(). Uses three inputs: condition, true_from and false_from. The output of this layer contains elements of true_from where condition is True, otherwise elements of false_from. condition has to be of dtype bool. true_from and false_from must have the same shape.

Parameters: condition (LayerBase) – true_from (LayerBase) – false_from (LayerBase) –
layer_class = 'switch'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(true_from, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SubnetworkLayer(subnetwork, concat_sources=True, load_on_init=None, dropout=0, dropout_noise_shape=None, **kwargs)[source]

You can define a whole subnetwork as a single layer by this class.

The subnetwork will be specified by a dict[str,dict[str]], just like a normal network is specified in the config.

The "output" layer of the subnetwork will be the output of this subnetwork-layer.

With concat_sources=True (default),
the input to this layer will be represented as the "data:data" or simply "data" in the subnetwork,
otherwise with concat_sources=False,
the input to this layer will be represented as "data:input_layer_name" for each input, in the subnetwork.
Parameters: subnetwork (dict[str,dict]) – subnetwork as dict (JSON content). must have an “output” layer- concat_sources (bool) – if we concatenate all sources into one, like it is standard for most other layers load_on_init (str|None) – if provided, for parameter initialization, we will load the given model file.
layer_class = 'subnetwork'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(subnetwork, concat_sources=True, n_out=None, out_type=None, **kwargs)[source]
Parameters: subnetwork (dict[str,dict[str]]) – n_out (int|None) – out_type (dict[str]|None) – Data
get_constraints_value()[source]
Returns: None or scalar tf.Tensor|None
classmethod get_losses(name, network, output, loss=None, reduce_func=None, layer=None, **kwargs)[source]
Parameters: name (str) – layer name network (TFNetwork.TFNetwork) – loss (Loss|None) – argument just as for __init__ output (Data) – the output (template) for the layer layer (LayerBase|None) – reduce_func (((tf.Tensor)->tf.Tensor)|None) – kwargs – other layer kwargs list[TFNetwork.LossHolder]
get_last_hidden_state(key)[source]

If this is a recurrent layer, this would return the last hidden state. Otherwise, we return None. :param int|str|None key: also the special key “*” :rtype: tf.Tensor | None :return: optional tensor with shape (batch, dim)

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, subnetwork, **kwargs)[source]
Parameters: batch_dim (tf.Tensor) – for this layer, might be with beam rec_layer (TFNetworkRecLayer.RecLayer) – subnetwork (dict[str,dict[str]]) – dict[str,tf.Tensor]
classmethod get_rec_initial_extra_outputs_shape_invariants(subnetwork, **kwargs)[source]
Parameters: subnetwork (dict[str,dict[str]]) – optional shapes for the tensors by get_rec_initial_extra_outputs dict[str,tf.TensorShape]
class TFNetworkLayer.VariableLayer(shape, dtype='float32', add_batch_axis=True, add_time_axis=False, trainable=True, init=0, **kwargs)[source]

Represents a variable. Can add batch/time dimension if wanted. Can be trainable. See defaults.

Parameters: shape (tuple[int]|list[int]) – dtype (str) – add_batch_axis (bool) – add_time_axis (bool) – trainable (bool) – init (str|float|int) – see TFUtil.get_initializer()
layer_class = 'variable'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, shape, dtype='float32', add_batch_axis=True, add_time_axis=False, **kwargs)[source]
Parameters: name (str) – shape (tuple[int]|list[int]) – dtype (str) – add_batch_axis (bool) – add_time_axis (bool) – Data
class TFNetworkLayer.AccumulateMeanLayer(exp_average, axes='bt', initial_value=None, is_prob_distribution=None, **kwargs)[source]

Accumulates the mean of the input (in training) (over batch-dim and time-dim by default). It’s similar to ReduceLayer

Parameters: exp_average (float) – momentum in exponential average calculation axes (int|list[str]|str) – the axes to reduce. must contain batch and time. initial_value (float) – how to initialize the variable which accumulates the mean is_prob_distribution (bool) – if provided, better default for initial_value
layer_class = 'accumulate_mean'[source]
classmethod get_out_data_from_opts(axes='bt', **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.FastBaumWelchLayer(align_target, sprint_opts=None, input_type='log_prob', tdp_scale=1.0, am_scale=1.0, min_prob=0.0, staircase_seq_len_source=None, **kwargs)[source]

Calls fast_baum_welch() or fast_baum_welch_by_sprint_automata(). We expect that our input are +log scores, e.g. use log-softmax.

Parameters: align_target (str) – e.g. “sprint” or “staircase” sprint_opts (dict[str]) – input_type (str) – “log_prob” or “prob” tdp_scale (float) – am_scale (float) – min_prob (float) – clips the minimum prob (value in [0,1]) staircase_seq_len_source (LayerBase|None) –
layer_class = 'fast_bw'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.SyntheticGradientLayer(gradient, **kwargs)[source]

This is a generalized way to be able to replace the true gradient with any kind of predicted gradient. This enabled to implement the idea from here:

Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343
layer_class = 'synthetic_gradient'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d inplace such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict(). It resolves certain arguments, e.g. it resolves the “from” argument which is a list of strings, to make it the “sources” argument in kwargs, with a list of LayerBase instances. Subclasses can extend/overwrite this. Usually the only reason to overwrite this is when some argument might be a reference to a layer which should be resolved.

classmethod get_out_data_from_opts(sources, name, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.AllophoneStateIdxParserLayer(num_phone_classes, num_states=3, context_len=1, **kwargs)[source]

This is very much Sprint/RASR specific. We get allophone state indices and return (center, left_1, right_1, …, state, boundary). The index is defined by NoTyingDense (ClassicStateTying.cc). In the Sprint config, this is via option –*.state-tying.type=no-tying-dense.

Parameters: sources (list[LayerBase]) – num_phone_classes (int) – number of phonemes + 1, with special 0 phone == no context num_states (int) – number of HMM states context_len (int) – left/right context len
layer_class = 'allophone_state_idx_parser'[source]
NumBoundaryClasses = 4[source]
classmethod get_out_data_from_opts(name, sources, context_len=1, n_out=None, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.FramewiseStatisticsLayer(sil_label_idx, histogram_num_bins=20, **kwargs)[source]

Collects various statistics (such as FER, etc) on the sources. The tensors will get stored in self.stats which will be collected by TFEngine.

layer_class = 'framewise_statistics'[source]
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.PrintLayer(**kwargs)[source]

Prints the sources to console/log

layer_class = 'print'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace, the loss_opts network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.ImageSummaryLayer(max_outputs=3, **kwargs)[source]

Creates image summaries which can be viewed in TensorBoard. This layer expects the source to be in (T-decoder, T-encoder, B, 1).

Parameters: max_outputs – number of images to generate per step
layer_class = 'image_summary'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace, the loss_opts network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.OfficialResNetLayer(num_filters, kernel_size, conv_stride, first_pool_size, first_pool_stride, first_kernel_size=3, block_sizes=[5, 5, 5], block_strides=[1, 2, 2], conv_time_dim=False, bottleneck=False, resnet_version=2, data_format=None, **kwargs)[source]

Wrapper around extern/official_tf_resnet.

This operates on NHWC (batch, height, width, channel) data, and returns ND, where D = num_classes. If you have (batch, time, width, channel) as input, you probably want to use WindowLayer to get (batch,time,window,width,channel), and then MergeDimsLayer to get (batch*time,window,width,channel), such that we would interpret window = height here. Then the output is (batch*time,D), and you can use SplitBatchTimeLayer to get (batch,time,D). As you get logits, you can then use ActivationLayer with softmax.

layer_class = 'official_resnet'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, bottleneck, conv_time_dim, num_filters, block_sizes=[5, 5, 5], sources=(), **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters: kwargs – all the same kwargs as for self.__init__() Data template (placeholder not set) Data
class TFNetworkLayer.Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, scale=1.0)[source]

Base class for all losses.

Parameters: base_network (TFNetwork.TFNetwork) – use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask() use_normalized_loss (bool) – the loss used in optimization will be normalized scale (float) – additional scale factor for the loss
class_name = None[source]
recurrent = False[source]
reduce_func(loss)[source]

Reduces the frames. Currently the sum, and we do averaging later. We might change this logic at some point. Also, some code overwrites this function externally, e.g. with TFUtil.identity, to not do reducing.

Parameters: loss (tf.Tensor) – e.g. (batch,time), or (time_flat,), or (batch,time,dim), etc tf.Tensor
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace, the loss_opts network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by LayerBase.transform_config_dict.

init_by_layer(layer)[source]
Parameters: layer (LayerBase|None) –
init(output, output_with_activation=None, target=None, layer=None)[source]
Parameters: output (Data) – generated output output_with_activation (OutputWithActivation|None) – target (Data) – reference target from dataset layer (LayerBase|None) –
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_normalization_factor()[source]
Returns: factor as a float scalar, usually 1.0 / num_frames. see self.reduce_func. tf.Tensor
classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters: target_dim (int) – normally just the same as target_dim. e.g. for CTC, we would add 1 for the blank label int
class TFNetworkLayer.CrossEntropyLoss(focal_loss_factor=0.0, label_smoothing=0.0, label_smoothing_gaussian=False, debug_dump=False, safe_log_opts=None, use_fused=True, **kwargs)[source]

Cross-Entropy loss. Basically sum(target * log(output)).

Parameters: focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled label_smoothing (float) – 0.1 is a common default. see TFUtil.smoothing_cross_entropy() label_smoothing_gaussian (bool) – see TFUtil.smoothing_cross_entropy() debug_dump (bool) – safe_log_opts (dict[str]) – passed to safe_log() use_fused (bool) – if possible, use fused opts
class_name = 'ce'[source]
get_output_target_scores()[source]
Returns: shape (time_flat,), type float32 tf.Tensor
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.BinaryCrossEntropyLoss(base_network, use_flatten_frames=True, use_normalized_loss=False, scale=1.0)[source]

Binary cross entropy. We expect the output as logits, not in probability space! Per frame: mean(target * log(sigmoid(output)) + (1 - target) * log(1 - sigmoid(output)))

Parameters: base_network (TFNetwork.TFNetwork) – use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask() use_normalized_loss (bool) – the loss used in optimization will be normalized scale (float) – additional scale factor for the loss
class_name = 'bin_ce'[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.GenericCELoss(**kwargs)[source]
class_name = 'generic_ce'[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.CtcLoss(target_collapse_repeated=False, auto_clip_target_len=False, output_in_log_space=False, beam_width=100, ctc_opts=None, focal_loss_factor=0.0, **kwargs)[source]

Connectionist Temporal Classification (CTC) loss. Basically a wrapper around tf.nn.ctc_loss.

Parameters: target_collapse_repeated (bool) – like preprocess_collapse_repeated option for CTC. used for sparse_labels(). auto_clip_target_len (bool) – see self._get_target_sparse_labels(). output_in_log_space (bool) – False -> output expected in prob space. see self.get_output_logits beam_width (int) – used in eval ctc_opts (dict[str]|None) – other kwargs used for tf.nn.ctc_loss focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled. generalized for CTC
class_name = 'ctc'[source]
recurrent = True[source]
init(**kwargs)[source]
Parameters: output (Data) – generated output output_with_activation (OutputWithActivation|None) – target (Data) – reference target from dataset layer (LayerBase|None) –
get_output_logits()[source]
Returns: outputs in log-space / logits tf.Tensor
get_soft_alignment()[source]

Also called the Baum-Welch-alignment. This is basically p_t(s|x_1^T,w_1^N), where s are the output labels (including blank), and w are the real target labels. :return: shape (time, batch, dim) :rtype: tf.Tensor

get_focal_loss_factor()[source]
Returns: shape (time, batch, dim) tf.Tensor
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters: target_dim (int) – normally just the same as target_dim. e.g. for CTC, we would add 1 for the blank label int
class TFNetworkLayer.EditDistanceLoss(debug_print=False, label_map=None, ctc_decode=False, output_in_log_space=False, **kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics.

Parameters: debug_print (bool) – will tf.Print the sequence label_map (dict[int,int]|None) – before calculating the edit-distance, will apply this map ctc_decode (bool) – True -> expects dense output and does CTC decode, False -> expects sparse labels in output output_in_log_space (bool) – False -> dense output expected in prob space. see self.get_output_logits
class_name = 'edit_distance'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters: output (Data) – generated output output_with_activation (OutputWithActivation|None) – target (Data) – reference target from dataset
get_output_logits()[source]
Returns: outputs in log-space / logits tf.Tensor
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.BleuLoss(**kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics. Also, BLEU is a score, i.e. the higher, the better. Thus, to interpret it as a loss or error, we take the negative value.

class_name = 'bleu'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters: output (Data) – generated output output_with_activation (OutputWithActivation|None) – target (Data) – reference target from dataset
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.ExpectedLoss(loss, loss_kind, norm_scores=True, norm_scores_stop_gradient=True, divide_beam_size=True, subtract_average_loss=True, loss_correction_grad_only=False, **kwargs)[source]

This loss uses another loss error or value and given the search beam scores, calculates the expected loss. Sometimes also called minimum Bayes risk.

Parameters: loss (Loss) – loss_kind (str) – “error” or “value”. whether to use loss.get_error() or loss.get_value() norm_scores (bool) – norm_scores_stop_gradient (bool) – divide_beam_size (bool) – subtract_average_loss (bool) – loss_correction_grad_only (bool) –
class_name = 'expected_loss'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace, the loss_opts network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by LayerBase.transform_config_dict.

init(**kwargs)[source]
Parameters: output (Data) – generated output output_with_activation (OutputWithActivation|None) – target (Data) – reference target from dataset layer (LayerBase|None) –
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
class TFNetworkLayer.DeepClusteringLoss(embedding_dimension, nr_of_sources, **kwargs)[source]

Cost function used for deep clustering as described in [Hershey & Chen+, 2016]: “Deep clustering discriminative embeddings for segmentation and separation”

Parameters: embedding_dimension (int) – nr_of_sources (int) –
class_name = 'deep_clustering'[source]
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor | None
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.L1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, scale=1.0)[source]

L1-distance loss. sum(target - output).

Parameters: base_network (TFNetwork.TFNetwork) – use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask() use_normalized_loss (bool) – the loss used in optimization will be normalized scale (float) – additional scale factor for the loss
class_name = 'l1'[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.MeanSquaredError(base_network, use_flatten_frames=True, use_normalized_loss=False, scale=1.0)[source]

The generic mean squared error loss function

Parameters: base_network (TFNetwork.TFNetwork) – use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask() use_normalized_loss (bool) – the loss used in optimization will be normalized scale (float) – additional scale factor for the loss
class_name = 'mse'[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
class TFNetworkLayer.ExternSprintLoss(sprint_opts, **kwargs)[source]

The loss is calculated by an extern Sprint instance.

Parameters: sprint_opts (dict[str]) –
class_name = 'sprint'[source]
recurrent = True[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
class TFNetworkLayer.FastBaumWelchLoss(sprint_opts, **kwargs)[source]

The loss is calculated via fast_baum_welch(). The automata are created by an extern Sprint instance.

Parameters: sprint_opts (dict[str]) –
class_name = 'fast_bw'[source]
recurrent = True[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
class TFNetworkLayer.ViaLayerLoss(error_signal_layer=None, align_layer=None, loss_wrt_to_act_in=False, **kwargs)[source]

The loss error signal and loss value is defined as the output of another layer. That way, you can define any custom loss. This could e.g. be used together with the fast_bw layer.

Parameters: error_signal_layer (LayerBase) – align_layer (LayerBase) – loss_wrt_to_act_in (bool|str) – if True, we expect that the given output_with_activation is set, and the given error signal is w.r.t. the input of the specific activation function. A common example is the input to the softmax function, where the gradient is much more stable to define, e.g. y - z instead of y/z for cross entropy. If you specify a str, e.g. “softmax” or “log_softmax”, there is an additional check that the used activation function is really that one.
class_name = 'via_layer'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: d (dict[str]) – will modify inplace, the loss_opts network (TFNetwork.TFNetwork) – -> LayerBase) get_layer (((str)) – function to get or construct another layer
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
get_error()[source]
Returns: frame error rate as a scalar value tf.Tensor
class TFNetworkLayer.SampledSoftmaxLoss(num_sampled=None, remove_accidental_hits=True, partition_strategy='mod', sampler='log_uniform', **kwargs)[source]

Sampled Softmax loss. This layer performs sampled_softmax_loss (see https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss) when training. This loss performs CE when in evaluation mode. See cond_on_train(…) for more details on branching between train and eval phase. For detailed explanation on SampledSoftmax see https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss

Parameters: num_sampled (int) – The number of classes to randomly sample per batch. remove_accidental_hits (bool) – True is a common default. Whether to remove “accidental hits” where a sampled class equals one of the target classes. partition_strategy (str) – ‘mod’ is common default. See TensorFlow documentation of sampled_softmax_loss sampler (str) – “log_uniform” is common default. Element of {“uniform”,”log_uniform”,”learned_unigram”}
class_name = 'sampled_softmax'[source]
get_value()[source]
Returns: loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info(). tf.Tensor|None
TFNetworkLayer.get_loss_class(loss)[source]
Parameters: loss (str) – loss type such as “ce” (() -> Loss) | type[Loss] | Loss
TFNetworkLayer.auto_register_layer_classes(vars_values)[source]

Example usage:

from TFNetworkLayer import auto_register_layer_classes
auto_register_layer_classes('extern_private/your_stuff/CoolThingy.py')

Parameters: vars_values (list|types.ModuleType|str) – e.g. use list(globals().values()). str is considered as a module-filename nothing
TFNetworkLayer.register_layer_class(layer_class)[source]

Registers a layer class such that it can be used in network construction.

Parameters: layer_class (type[LayerBase]) – nothing
TFNetworkLayer.get_layer_class(name)[source]
Parameters: name (str) – matches layer_class (() -> LayerBase) | type[LayerBase] | LayerBase
TFNetworkLayer.get_layer_class_name_list()[source]