TFNetworkLayer

class TFNetworkLayer.AccumulateMeanLayer(exp_average, axes='bt', initial_value=None, is_prob_distribution=None, **kwargs)[source]

Accumulates the mean of the input (in training). It’s similar to ReduceLayer

Parameters:
  • exp_average (float) – momentum in exponential average calculation
  • axes (int|list[str]|str) – the axes to reduce. must contain batch and time.
  • initial_value (float) – how to initialize the variable which accumulates the mean
  • is_prob_distribution (bool) – if provided, better default for initial_value
classmethod get_out_data_from_opts(axes='bt', **kwargs)[source]
layer_class = 'accumulate_mean'[source]
class TFNetworkLayer.ActivationLayer(activation, **kwargs)[source]

This layer just applies an activation function. See TFUtil.get_activation_function() about supported functions. Also see EvalLayer and CombineLayer for similar layers.

Parameters:activation (str) – e.g. “relu”, “tanh”, etc
layer_class = 'activation'[source]
class TFNetworkLayer.AllophoneStateIdxParserLayer(num_phone_classes, num_states=3, context_len=1, **kwargs)[source]

This is very much Sprint/RASR specific. We get allophone state indices and return (center, left_1, right_1, ..., state, boundary). The index is defined by NoTyingDense (ClassicStateTying.cc). In the Sprint config, this is via option –*.state-tying.type=no-tying-dense.

Parameters:
  • sources (list[LayerBase]) –
  • num_phone_classes (int) – number of phonemes + 1, with special 0 phone == no context
  • num_states (int) – number of HMM states
  • context_len (int) – left/right context len
NumBoundaryClasses = 4[source]
classmethod get_out_data_from_opts(name, sources, context_len=1, n_out=None, **kwargs)[source]
layer_class = 'allophone_state_idx_parser'[source]
class TFNetworkLayer.ApplyLengthDistributionLayer(length_model_scale=1.0, **kwargs)[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
layer_class = 'apply_length_distribution'[source]
class TFNetworkLayer.BatchNormLayer(**kwargs)[source]

Implements batch-normalization (http://arxiv.org/abs/1502.03167) as a separate layer.

All kwargs which are present in our base class are passed to our base class. All remaining kwargs are used for self.batch_norm().

layer_class = 'batch_norm'[source]
class TFNetworkLayer.BinaryCrossEntropy(base_network)[source]

Binary cross entropy. We expect the output as logits, not in probability space! Per frame: mean(target * log(sigmoid(output)) + (1 - target) * log(1 - sigmoid(output)))

Parameters:base_network (TFNetwork.TFNetwork) –
class_name = 'bin_ce'[source]
get_value()[source]
class TFNetworkLayer.ClassesToLengthDistributionGlobalLayer(window=15, weight_falloff=1.0, **kwargs)[source]
classmethod get_out_data_from_opts(name, sources, window, **kwargs)[source]
layer_class = 'classes_to_length_distribution_global'[source]
class TFNetworkLayer.ClassesToLengthDistributionLayer(window=15, scale=1.0, **kwargs)[source]
classmethod get_out_data_from_opts(name, sources, window, **kwargs)[source]
layer_class = 'classes_to_length_distribution'[source]
class TFNetworkLayer.ClassesToSegmentsLayer(num_classes, window=15, **kwargs)[source]

This layer takes a sequence of classes (=> sparse input) and applies a window (same as SegmentInput) to it. For each position t in the window it computes the relative frequencies of the classes up to and including that position t.

classmethod get_out_data_from_opts(name, sources, num_classes, window, **kwargs)[source]
layer_class = 'classes_to_segments'[source]
class TFNetworkLayer.CombineDimsLayer(axes, **kwargs)[source]

Combines multiple dimensions. See also MergeDimsLayer.

Parameters:axis (int|list[int]|str) – one axis or multiple axis to reduce. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
classmethod get_out_data_from_opts(axes, sources, **kwargs)[source]
layer_class = 'combine_dims'[source]
class TFNetworkLayer.CombineLayer(kind, sources, activation=None, with_bias=False, eval=None, eval_locals=None, eval_for_output_loss=False, **kwargs)[source]

Applies some binary operation on all sources, such as addition. Also see ActivationLayer.

Parameters:
  • kind (str) – e.g. “average” or “add”, or “eval”
  • sources (list[LayerBase]) –
  • activation (str|None) – if provided, activation function to apply, e.g. “tanh” or “relu”
  • with_bias (bool) – if given, will add a bias
  • eval (str) – for kind=”eval”, will eval this string. see _op_kind_eval()
  • eval_locals (dict[str]|None) – locals for eval
  • eval_for_output_loss (bool) – will do the same eval on layer.output_loss
classmethod get_out_data_from_opts(n_out=None, out_type=None, sources=(), **kwargs)[source]
layer_class = 'combine'[source]
class TFNetworkLayer.CompareLayer(kind='equal', value=None, **kwargs)[source]

Compares (e.g. equality check) all the sources element-wise.

Parameters:
  • kind (str) – e.g. “equal”
  • value (float|int|None) – if specified, will also compare to this
classmethod get_out_data_from_opts(n_out=None, out_type=None, sources=(), **kwargs)[source]
layer_class = 'compare'[source]
class TFNetworkLayer.ConstantLayer(sources, value=0, dtype=None, **kwargs)[source]

Output is a constant value.

classmethod get_out_data_from_opts(name, dtype='float32', **kwargs)[source]
layer_class = 'constant'[source]
class TFNetworkLayer.ConvLayer(n_out, filter_size, padding, strides=1, dilation_rate=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, with_bias=False, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, **kwargs)[source]

A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.

Parameters:
  • n_out (int) – number of outgoing features
  • filter_size (tuple[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.
  • padding (str) – “same” or “valid”
  • strides (int|tuple[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
  • input_expand_dims (int) – number of dynamic dims to add to the input
  • input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.
  • input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.
  • with_bias (bool) – if True, will add a bias to the output features
  • activation (None|str) – if set, will apply this function at the end
classmethod calc_out_dim(in_dim, filter_size, stride, padding, dilation_rate=1)[source]
Parameters:
  • in_dim (int|tf.Tensor) – dimension in some axis
  • filter_size (int) – e.g. 2, for the corresponding axis
  • stride (int) – e.g. 1, for the corresponding axis
  • dilation_rate (int) – e.g. 1
  • padding (str) – “valid” or “same”
Returns:

the output dimension

Return type:

int

classmethod get_out_data_from_opts(**kwargs)[source]
layer_class = 'conv'[source]
recurrent = True[source]
class TFNetworkLayer.CopyLayer(**kwargs)[source]

This layer does nothing, it copies its input. If multiple sources are provided, they are concatenated in the feature-dim.

classmethod get_out_data_from_opts(name, sources=(), out_type=None, n_out=None, **kwargs)[source]
layer_class = 'copy'[source]
class TFNetworkLayer.CrossEntropyLoss(focal_loss_factor=0.0, **kwargs)[source]

Cross-Entropy loss. Basically sum(target * log(output)).

Parameters:focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled
class_name = 'ce'[source]
get_output_target_scores()[source]
Returns:shape (time_flat,), type float32
Return type:tf.Tensor
get_value()[source]
class TFNetworkLayer.CtcLoss(target_collapse_repeated=False, auto_clip_target_len=False, output_in_log_space=False, focal_loss_factor=0.0, **kwargs)[source]

Connectionist Temporal Classification (CTC) loss. Basically a wrapper around tf.nn.ctc_loss.

Parameters:
  • target_collapse_repeated (bool) – like preprocess_collapse_repeated option for CTC. used for sparse_labels().
  • auto_clip_target_len (bool) – see self._get_target_sparse_labels().
  • output_in_log_space (bool) – False -> output expected in prob space. see self.get_output_logits
  • focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled. generalized for CTC
class_name = 'ctc'[source]
classmethod get_auto_output_layer_dim(target_dim)[source]
get_error()[source]
get_focal_loss_factor()[source]
Returns:shape (time, batch, dim)
Return type:tf.Tensor
get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_soft_alignment()[source]

Also called the Baum-Welch-alignment. This is basically p_t(s|x_1^T,w_1^N), where s are the output labels (including blank), and w are the real target labels. :return: shape (time, batch, dim) :rtype: tf.Tensor

get_value()[source]
init(**kwargs)[source]
recurrent = True[source]
class TFNetworkLayer.DeepClusteringLoss(embedding_dimension, nr_of_sources, **kwargs)[source]

Cost function used for deep clustering as described in [Hershey & Chen+, 2016]: “Deep clustering discriminative embeddings for segmentation and separation”

Parameters:
  • embedding_dimension (int) –
  • nr_of_sources (int) –
class_name = 'deep_clustering'[source]
get_error()[source]
Returns:frame error rate as a scalar value
Return type:tf.Tensor | None
get_value()[source]
class TFNetworkLayer.EditDistanceLoss(debug_print=False, label_map=None, ctc_decode=False, output_in_log_space=False, **kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics.

Parameters:
  • debug_print (bool) – will tf.Print the sequence
  • label_map (dict[int,int]|None) – before calculating the edit-distance, will apply this map
  • ctc_decode (bool) – True -> expects dense output and does CTC decode, False -> expects sparse labels in output
  • output_in_log_space (bool) – False -> dense output expected in prob space. see self.get_output_logits
class_name = 'edit_distance'[source]
get_error()[source]
get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_value()[source]
init(output, output_with_activation=None, target=None)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
recurrent = True[source]
class TFNetworkLayer.ElemwiseProdLayer(axes, size=None, **kwargs)[source]

Element-wise product in some axes. Microsoft calls this “static attention”, in Deep Conv. NN with Layer-wise Context Expansion and Attention (LACE).

Parameters:
  • axes (str|list[str]) – e.g. “spatial”, but all those axes must be of fixed dimension
  • size (tuple[int]) – for double-checking, you can explicitly provide the size
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
layer_class = 'elemwise_prod'[source]
class TFNetworkLayer.EvalLayer(eval, **kwargs)[source]

Evaluates some string. The CombineLayer provides this functionality, thus this is just a special case of it. Also see ActivationLayer.

Parameters:eval (str) – will eval this string. see _op_kind_eval()
layer_class = 'eval'[source]
class TFNetworkLayer.ExpandDimsLayer(axis, dim=1, **kwargs)[source]

Adds some axis.

Parameters:
  • axis (str|int) – axis to add, e.g. “F”|”feature” or “spatial”. if this is an integer, the input data is first converted into batch-major mode, and then this is counted with batch-dim.
  • dim (int) – dimension of new axis (1 by default)
classmethod get_out_data_from_opts(name, axis, dim=1, sources=(), **kwargs)[source]
layer_class = 'expand_dims'[source]
class TFNetworkLayer.ExternSprintLoss(sprint_opts, **kwargs)[source]

The loss is calculated by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'sprint'[source]
get_error()[source]
get_value()[source]
recurrent = True[source]
class TFNetworkLayer.FastBaumWelchLayer(align_target, sprint_opts=None, **kwargs)[source]

Calls fast_baum_welch() or fast_baum_welch_by_sprint_automata(). We expect that our input are +log scores.

Parameters:
  • align_target (str) – e.g. “sprint”
  • sprint_opts (dict[str]) –
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
layer_class = 'fast_bw'[source]
recurrent = True[source]
class TFNetworkLayer.FastBaumWelchLoss(sprint_opts, **kwargs)[source]

The loss is calculated via fast_baum_welch(). The automata are created by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'fast_bw'[source]
get_error()[source]
get_value()[source]
recurrent = True[source]
class TFNetworkLayer.FillUnusedMemoryLayer(fill_value=0.0, **kwargs)[source]

Fills all unused entries in the time/batch/feature tensor with a constant

layer_class = 'fill_unused'[source]
class TFNetworkLayer.FramewiseStatisticsLayer(sil_label_idx, histogram_num_bins=20, **kwargs)[source]

Collects various statistics (such as FER, etc) on the sources. The tensors will get stored in self.stats which will be collected by TFEngine.

classmethod get_out_data_from_opts(**kwargs)[source]
layer_class = 'framewise_statistics'[source]
class TFNetworkLayer.FsaLayer(**kwargs)[source]
layer_class = 'fsa'[source]
class TFNetworkLayer.GatingLayer(activation, gate_activation='sigmoid', **kwargs)[source]

Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.

classmethod get_out_data_from_opts(name, sources, n_out=None, **kwargs)[source]
layer_class = 'gating'[source]
class TFNetworkLayer.GenericCELoss(**kwargs)[source]
class_name = 'generic_ce'[source]
get_value()[source]
class TFNetworkLayer.InternalLayer(name, network, output=None, n_out=None, out_type=None, sources=(), target=None, loss=None, loss_scale=1.0, size_target=None, reuse_params=None, L2=None, darc1=None, is_output_layer=None, only_on_eval=False, copy_output_loss_from_source_idx=None, batch_norm=False, spatial_smoothing=0.0, initial_output=None, rec_previous_layer=None, trainable=True)[source]

This is not supposed to be used by the user. It is used by some code to construct a wrapper layer or so.

Parameters:
  • name (str) –
  • network (TFNetwork.TFNetwork) –
  • output (Data) –
  • n_out (None|int) – output dim
  • out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
  • sources (list[LayerBase]) – via self.transform_config_dict()
  • target (str|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target) alternatively, this also can be a layer name.
  • size_target (str|None) – like target but this is only used to set our output size in case of training
  • loss (Loss|None) – via self.transform_config_dict()
  • loss_scale (float) – scale factor for loss (1.0 by default)
  • reuse_params (LayerBase|None) – if given, will reuse the params from this layer. see self.var_creation_scope()
  • L2 (float|None) – for constraints
  • darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
  • is_output_layer (bool|None) –
  • only_on_eval (bool) – if True, this layer will only be calculated in eval
  • copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
  • batch_norm (bool|dict) – see self.batch_norm()
  • initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
  • rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us
  • trainable (bool) – whether the parameters of this layer will be trained
class TFNetworkLayer.L1Loss(base_network)[source]

L1-distance loss. sum(target - output).

Parameters:base_network (TFNetwork.TFNetwork) –
class_name = 'l1'[source]
get_value()[source]
class TFNetworkLayer.LayerBase(name, network, output=None, n_out=None, out_type=None, sources=(), target=None, loss=None, loss_scale=1.0, size_target=None, reuse_params=None, L2=None, darc1=None, is_output_layer=None, only_on_eval=False, copy_output_loss_from_source_idx=None, batch_norm=False, spatial_smoothing=0.0, initial_output=None, rec_previous_layer=None, trainable=True)[source]

This is the base class for all layers. Every layer by default has a list of source layers sources and defines self.output which is of type Data. It shares some common functionality across all layers, such as explicitly defining the output format, some parameter regularization, and more.

If you want to implement your own layer:

class YourOwnLayer(_ConcatInputLayer):  # e.g. either _ConcatInputLayer or LayerBase as a base
    " some docstring "
    layer_class = "your_layer_name"

    def __init__(self, your_kwarg1, your_opt_kwarg2=None, **kwargs):
        " docstring, document the args! "
        super(YourOwnLayer, self).__init__(**kwargs)
        # Now we need to set self.output, which must be of type :class:`Data`.
        # It is set at this point to whatever we got from `selfget_out_data_from_opts()`,
        # so it is enough if we set self.output.placeholder and self.output.size_placeholder,
        # but we could also reset self.output.
        self.output.placeholder = self.input_data.placeholder + 42  # whatever you want to do
        # If you don't modify the sizes (e.g. sequence-length), just copy the input sizes.
        self.output.size_placeholder = self.input_data.size_placeholder.copy()

    @classmethod
    def get_out_data_from_opts(cls, **kwargs):
        " This is supposed to return a :class:`Data` instance as a template, given the arguments. "
        # example, just the same as the input:
        return get_concat_sources_data_template(kwargs["sources"], name="%s_output" % kwargs["name"])
Parameters:
  • name (str) –
  • network (TFNetwork.TFNetwork) –
  • output (Data) –
  • n_out (None|int) – output dim
  • out_type (dict[str]) – kwargs for Data class. more explicit than n_out.
  • sources (list[LayerBase]) – via self.transform_config_dict()
  • target (str|None) – if some loss is set, this is the target data-key, i.e. network.extern_data.get_data(target) alternatively, this also can be a layer name.
  • size_target (str|None) – like target but this is only used to set our output size in case of training
  • loss (Loss|None) – via self.transform_config_dict()
  • loss_scale (float) – scale factor for loss (1.0 by default)
  • reuse_params (LayerBase|None) – if given, will reuse the params from this layer. see self.var_creation_scope()
  • L2 (float|None) – for constraints
  • darc1 (float|None) – for constraints. see Generalization in Deep Learning, https://arxiv.org/abs/1710.05468
  • is_output_layer (bool|None) –
  • only_on_eval (bool) – if True, this layer will only be calculated in eval
  • copy_output_loss_from_source_idx (int|None) – if set, will copy output_loss from this source
  • batch_norm (bool|dict) – see self.batch_norm()
  • initial_output (str|float) – used for recurrent layer, see self.get_rec_initial_output()
  • rec_previous_layer (LayerBase|None) – via the recurrent layer, layer (template) which represents the past of us
  • trainable (bool) – whether the parameters of this layer will be trained
add_param(param, custom_update=None)[source]
Parameters:
  • param (tf.Variable) –
  • custom_update (None|CustomUpdate) – will be applied in training, instead of taking the gradient
Returns:

param

:rtype tf.Variable

batch_norm(data, use_shift=True, use_std=True, use_sample=0.0, force_sample=False, momentum=0.99, epsilon=0.001, sample_mean=None, sample_variance=None, gamma=None, beta=None)[source]
Parameters:
  • data (Data) –
  • use_shift (bool) –
  • use_std (bool) –
  • use_sample (float) – defaults to 0.0 which is used in training
  • force_sample (bool) – even in eval, use the use_sample factor
  • momentum (float) – for the running average of sample_mean and sample_std
  • epsilon (float) –
  • sample_mean (tf.Tensor) –
  • sample_variance (tf.Tensor) –
  • gamma (tf.Tensor) –
  • beta (tf.Tensor) –
Return type:

tf.Tensor

http://arxiv.org/abs/1502.03167

Also see:
tf.nn.batch_normalization() https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/batch_norm.py
classmethod cls_get_tf_scope_name(name)[source]
Parameters:name (str) – layer name
Returns:scope name, might be just name
classmethod cls_layer_scope(name)[source]

Setup scope for layer. This can also be used when the layer does not yet exists. This is supposed to cover variable creations as well. Currently vars might be created when used within the rec-layer, but they are caught in a more generic way there, so we have not implemented yet any special logic here.

Parameters:name (str) – layer name
Returns:context manager object
get_absolute_name_scope_prefix()[source]
Returns:e.g. “output/”, always with “/” at end
Return type:str
get_base_absolute_name_scope_prefix()[source]
Returns:e.g. “output/”, always with “/” at end
Return type:str
get_batch_dim()[source]

The batch dim by this layer, not taken from our output but calculated. Normally it is self.network.get_batch_dim() but if we do search and there was a choice layer, it it multiplied by the beam size. :return: batch dim * beam size :rtype: tf.Tensor

get_constraints_value()[source]
Returns:None or scalar
Return type:tf.Tensor|None
get_darc1()[source]

DARC1, simplified Directly Approximately Regularizing Complexity (DARC), via Generalization in Deep Learning, https://arxiv.org/abs/1710.05468

Returns:scalar
Return type:tf.Tensor
get_dep_layers()[source]
Returns:list of layers this layer depends on. normally this is just self.sources but e.g. the attention layer in addition has a base, etc.
Return type:list[LayerBase]
get_error_value()[source]
Returns:usually the frame error rate, or None if not defined
Return type:tf.Tensor | None
get_hidden_state()[source]

If this is a recurrent layer, this would return the hidden state. This is used e.g. for the RnnCellLayer class. :rtype: tf.Tensor | list[tf.Tensor] | None :return: optional tensor(s) with shape (time, batch, dim)

get_last_hidden_state()[source]

If this is a recurrent layer, this would return the last hidden state. Otherwise, we return None. :rtype: tf.Tensor | None :return: optional tensor with shape (batch, dim)

get_loss_normalization_factor()[source]
get_loss_value()[source]
Returns:the loss, a scalar value, or None if not set. not multiplied by loss_scale
Return type:tf.Tensor | None
classmethod get_out_data_from_opts(**kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters:kwargs – all the same kwargs as for self.__init__()
Returns:Data template (placeholder not set)
Return type:Data
get_output_spatial_smoothing_energy()[source]
Returns:scalar. see TFUtil.spatial_smoothing_energy()
Return type:tf.Tensor
get_param_values_dict(session)[source]
Parameters:session (tf.Session) –
Returns:dict name -> values
Return type:dict[str,numpy.ndarray]
get_params_l2_norm()[source]
Returns:scalar
Return type:tf.Tensor
classmethod get_rec_initial_extra_outputs(batch_dim, **kwargs)[source]
Parameters:batch_dim (tf.Tensor) – for this layer, might be with beam
Return type:dict[str,tf.Tensor]
classmethod get_rec_initial_extra_outputs_shape_invariants(**kwargs)[source]
Returns:optional shapes for the tensors by get_rec_initial_extra_outputs
Return type:dict[str,tf.TensorShape]
classmethod get_rec_initial_output(batch_dim, name, output, initial_output=None, **kwargs)[source]

If this layer is used inside a recurrent layer, this function specifies the output of frame t=-1, if it is needed. As arguments, we get the usual layer arguments. batch_dim is added because it might be special because of beam search.

Note: This could maybe share code with RnnCellLayer._get_rec_initial_state(). We could also add support to make the initial output be the output of another layer.

Parameters:
  • batch_dim (tf.Tensor) – including beam size in beam search
  • name (str) –
  • output (Data) – template
  • initial_output (str|float|int|tf.Tensor|None) –
Return type:

tf.Tensor

get_saveable_params_dict()[source]
Returns:params and saveable_param_replace resolved
Return type:dict[str,tf.Variable|tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject]
get_search_beam_size()[source]
Returns:beam size if there was a choice layer and we do search
Return type:int|None
get_search_choices()[source]
Return type:SearchChoices|None
is_output_layer()[source]

Some code differs between an output layer and other layers. It is a bit arbitrary what we define as output layer. :rtype: bool

layer_class = None[source]
post_init()[source]

This gets called right after self.__init__().

recurrent = False[source]
set_param_values_by_dict(values_dict, session)[source]
Parameters:
  • values_dict (dict[str,numpy.ndarray]) –
  • session (tf.Session) –
tf_scope_name[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by TFNetwork.construct_from_dict().

var_creation_scope(*args, **kwds)[source]

This takes care of setting up a scope where variables can be created.

Returns:yields the variable_scope
class TFNetworkLayer.LinearLayer(activation, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, **kwargs)[source]

Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function.

Parameters:
layer_class = 'linear'[source]
class TFNetworkLayer.Loss(base_network)[source]

Base class for all losses.

Parameters:base_network (TFNetwork.TFNetwork) –
class_name = None[source]
classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters:target_dim (int) –
Returns:normally just the same as target_dim. e.g. for CTC, we would add 1 for the blank label
Return type:int
get_error()[source]
Returns:frame error rate as a scalar value
Return type:tf.Tensor
get_normalization_factor()[source]
Returns:factor as a float scalar, usually 1.0 / num_frames. see self.reduce_func.
Return type:tf.Tensor
get_value()[source]
Returns:loss as a scalar float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info().
Return type:tf.Tensor|None
init(output, output_with_activation=None, target=None)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
recurrent = False[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer

Will modify d such that it becomes the kwargs for self.__init__(). Mostly leaves d as-is. This is used by LayerBase.transform_config_dict.

class TFNetworkLayer.MeanSquaredError(base_network)[source]

The generic mean squared error loss function

Parameters:base_network (TFNetwork.TFNetwork) –
class_name = 'mse'[source]
get_value()[source]
class TFNetworkLayer.MergeDimsLayer(axes, n_out=None, **kwargs)[source]

Merges a list of axes into a single one. E.g. input is (batch, width, height, dim) and axes=(1,2), then we get (batch, width*height, dim). Or input is (batch, time, height, dim) and axes=”except_time”, then we get (batch, time, height*dim). See also CombineDimsLayer.

Parameters:
  • axes (str|list[str]|list[int]) – see Data.get_axes_from_description(), e.g. “except_time”
  • n_out (int|None) –
classmethod get_out_data_from_opts(name, axes, sources=(), n_out=None, out_type=None, **kwargs)[source]
layer_class = 'merge_dims'[source]
class TFNetworkLayer.PadLayer(axes, padding, value=None, mode='constant', **kwargs)[source]

Adds (e.g. zero) padding in some axis or axes.

Parameters:
  • axes (str|list[str]) – e.g. “F” etc. see Dataset.get_axes_from_description().
  • padding (list[(int,int)]|(int,int)|int) – how much to pad left/right in each axis
  • value (int|float) – what constant value to pad, with mode==”constant”
  • mode (str) – “constant”, “reflect” or “symmetric”
classmethod get_out_data_from_opts(name, axes, padding, sources=(), **kwargs)[source]
layer_class = 'pad'[source]
class TFNetworkLayer.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, **kwargs)[source]

A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.

Parameters:
  • mode (str) – “max” or “avg”
  • pool_size (tuple[int]) – shape of the window of each reduce
  • padding (str) – “valid” or “same”
  • dilation_rate (tuple[int]|int) –
  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
classmethod get_out_data_from_opts(name, pool_size, strides=None, dilation_rate=1, sources=(), padding='VALID', **kwargs)[source]
layer_class = 'pool'[source]
recurrent = True[source]
class TFNetworkLayer.PrefixInTimeLayer(prefix=0.0, repeat=1, **kwargs)[source]
Parameters:
  • prefix (float|str) – either some constant or another layer
  • repeat (int) – how often to repeat the prefix
layer_class = 'prefix_in_time'[source]
class TFNetworkLayer.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]

This reduces some axis by using “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.

Parameters:
  • mode (str) – “sum” or “max” or “mean”
  • axes (int|list[int]|str) – one axis or multiple axis to reduce. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
  • axis (int|list[int]|str) – for compatibility, can be used instead of axes
  • keep_dims (bool) – if dimensions should be kept (will be 1)
  • enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that.
classmethod get_axes(axis, input_data)[source]
Parameters:
  • axis – see self.__init__()
  • input_data (Data) –
Returns:

list of axes

Return type:

list[int]

classmethod get_out_data_from_opts(name, sources, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]
layer_class = 'reduce'[source]
classmethod need_enforce_batch_dim_axis(axes)[source]
Parameters:axes (int|list[int]|str) –
Returns:if any integer is in axes, thus we should have a fixed dimension layout
Return type:bool
class TFNetworkLayer.ReduceOutLayer(mode, num_pieces, **kwargs)[source]

Combination of SplitDimsLayer applied to the feature dim and ReduceLayer applied to the resulting feature dim. This can e.g. be used to do maxout.

Parameters:
  • mode (str) – “sum” or “max” or “mean”
  • num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.
classmethod get_out_data_from_opts(num_pieces, sources, name, **kwargs)[source]
layer_class = 'reduce_out'[source]
class TFNetworkLayer.ResizeLayer(factor, axis, kind='nn', **kwargs)[source]

Resizes the input, i.e. upsampling or downsampling. Supports different kinds, such as linear interpolation or nearest-neighbor.

Parameters:
  • factor (int) –
  • axis (str|int) – the axis to resize, counted with batch-dim. can also be “T” for time
  • kind (str) – “linear”, “nn”/”nearest_neighbor”, “cubic”
classmethod get_out_data_from_opts(factor, axis, sources, **kwargs)[source]
layer_class = 'resize'[source]
class TFNetworkLayer.SearchChoices(owner, src_beams=None, beam_size=None, is_decided=False)[source]
Parameters:
  • owner (LayerBase) –
  • src_beams (tf.Tensor|None) – (batch, beam) -> src beam index
  • beam_size (int|None) –
  • is_decided (bool) – by decide layer
set_beam_scores(scores)[source]
Parameters:scores (tf.Tensor) – (batch, beam) -> log score
set_beam_scores_from_own_rec()[source]
set_beam_scores_from_rec(rev_vars_outputs)[source]
Parameters:rev_vars_outputs (dict[str,tf.Tensor]) –
src_layer[source]
Returns:The layer where we had the last search choices.
Return type:LayerBase
class TFNetworkLayer.SegmentInputLayer(window=15, **kwargs)[source]

This layer takes the input data, applies a window and outputs each window as a new batch, this is more efficient than a window as a new dimension if sequences have varying lengths

classmethod get_out_data_from_opts(name, sources, window, **kwargs)[source]
layer_class = 'segment_input'[source]
class TFNetworkLayer.SliceLayer(axis, slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]

Slicing on the input, i.e. x[start:end:step] in some axis.

Parameters:
  • axis (int|str) –
  • axis_kind (str|None) – “T” for time, “B” for batch, “F” for feature
  • slice_start (int|None) –
  • slice_end (int|None) –
  • slice_step (int|None) –
classmethod get_out_data_from_opts(name, axis, sources=(), slice_start=None, slice_end=None, slice_step=None, **kwargs)[source]
layer_class = 'slice'[source]
class TFNetworkLayer.SoftmaxLayer(activation='softmax', **kwargs)[source]

Just a LinearLayer with activation=”softmax” by default.

layer_class = 'softmax'[source]
class TFNetworkLayer.SoftmaxOverSpatialLayer(energy_factor=None, **kwargs)[source]

This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to SoftmaxLayer, this will not do a linear transformation.

Parameters:energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim).
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
layer_class = 'softmax_over_spatial'[source]
class TFNetworkLayer.SourceLayer(network, data_key=None, sources=(), **kwargs)[source]
Parameters:
classmethod get_out_data_from_opts(network, data_key=None, **kwargs)[source]
Parameters:
Return type:

Data

layer_class = 'source'[source]
class TFNetworkLayer.SplitBatchTimeLayer(base, **kwargs)[source]

A very specific layer which expects to get input of shape (batch * time, ...) and converts it into (batch, time, ...), where it recovers the seq-lens from some other layer.

Parameters:base (LayerBase) –
classmethod get_out_data_from_opts(name, base, sources=(), **kwargs)[source]
layer_class = 'split_batch_time'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
class TFNetworkLayer.SplitDimsLayer(axis, dims, **kwargs)[source]

Splits one axis into multiple axes. E.g. if you know that your feature-dim is composed by a window, i.e. the input is (batch, time, window * feature), you can set axis=”F”, dims=(window, -1), and you will get the output (batch, time, window, feature).

Parameters:
  • axis (str) – e.g. “F”
  • dims (tuple[int]) – what the axis should be split into. e.g. (window, -1)
classmethod get_out_data_from_opts(name, axis, dims, sources=(), **kwargs)[source]
layer_class = 'split_dims'[source]
class TFNetworkLayer.SqueezeLayer(axis, enforce_batch_dim_axis=0, **kwargs)[source]

Removes an axis with dimension 1. This is basically a wrapper around tf.squeeze.

Parameters:axis (int|list[int]|str) – one axis or multiple axis to squeeze. this is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). it also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”
classmethod get_out_data_from_opts(**kwargs)[source]
layer_class = 'squeeze'[source]
class TFNetworkLayer.SubnetworkLayer(subnetwork, concat_sources=True, load_on_init=None, **kwargs)[source]

You can define a whole subnetwork as a single layer by this class.

The subnetwork will be specified by a dict[str,dict[str]], just like a normal network is specified in the config.

The "output" layer of the subnetwork will be the output of this subnetwork-layer.

With concat_sources=True (default),
the input to this layer will be represented as the "data:data" or simply "data" in the subnetwork,
otherwise with concat_sources=False,
the input to this layer will be represented as "data:input_layer_name" for each input, in the subnetwork.
Parameters:
  • subnetwork (dict[str,dict]) – subnetwork as dict (JSON content). must have an “output” layer-
  • concat_sources (bool) – if we concatenate all sources into one, like it is standard for most other layers
  • load_on_init (str|None) – if provided, for parameter initialization, we will load the given model file.
get_constraints_value()[source]
get_error_value()[source]
get_last_hidden_state()[source]
get_loss_value()[source]
classmethod get_out_data_from_opts(subnetwork, n_out=None, out_type=None, **kwargs)[source]
Parameters:
  • subnetwork (dict[str,dict[str]]) –
  • n_out (int|None) –
  • out_type (dict[str]|None) –
Return type:

Data

layer_class = 'subnetwork'[source]
recurrent = True[source]
class TFNetworkLayer.SwapTimeFeatureLayer(**kwargs)[source]
classmethod get_out_data_from_opts(**kwargs)[source]
layer_class = 'swap_time_feature'[source]
class TFNetworkLayer.SyntheticGradientLayer(gradient, **kwargs)[source]

This is a generalized way to be able to replace the true gradient with any kind of predicted gradient. This enabled to implement the idea from here:

Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343
Parameters:gradient (LayerBase) –
classmethod get_out_data_from_opts(sources, name, **kwargs)[source]
layer_class = 'synthetic_gradient'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
class TFNetworkLayer.UnsegmentInput(**kwargs)[source]

Takes the output of SegmentInput (sequences windowed over time and folded into batch-dim) and restores the original batch dimension. The feature dimension contains window * original_features many entries. The entries at time t all correspond to windows ending at time t. The window that started in the same frame comes first, then the window that started in the frame before and so on. This is also the format used for the segmental decoder in RASR.

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
layer_class = 'unsegment_input'[source]
class TFNetworkLayer.ViaLayerLoss(error_signal_layer=None, align_layer=None, loss_wrt_to_act_in=False, **kwargs)[source]

The loss error signal and loss value is defined as the output of another layer. That way, you can define any custom loss. This could e.g. be used together with the fast_bw layer.

Parameters:
  • error_signal_layer (LayerBase) –
  • align_layer (LayerBase) –
  • loss_wrt_to_act_in (bool|str) – if True, we expect that the given output_with_activation is set, and the given error signal is w.r.t. the input of the specific activation function. A common example is the input to the softmax function, where the gradient is much more stable to define, e.g. y - z instead of y/z for cross entropy. If you specify a str, e.g. “softmax” or “log_softmax”, there is an additional check that the used activation function is really that one.
class_name = 'via_layer'[source]
get_error()[source]
get_value()[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
class TFNetworkLayer.WeightedSumLayer(axes, padding=None, size=None, keep_dims=None, **kwargs)[source]

Calculates a weighted sum, either over a complete axis of fixed dimension, or over some window. Can also do that for multiple axes.

Parameters:
  • axes (str|list[str]) – the axes to do the weighted-sum over
  • padding (str) – “valid” or “same”, in case of keep_dims=True
  • size (None|tuple[int]) – the kernel-size. if left away, the axes must be of fixed dimension, and we will use keep_dims=False, padding=”valid” by default. Otherwise, if given, you must also provide padding and keep_dims=True by default.
  • keep_dims (bool) – if False, the axes will be squeezed away. see also size.
classmethod get_out_data_from_opts(name, sources, axes, padding=None, size=None, keep_dims=None, **kwargs)[source]
layer_class = 'weighted_sum'[source]
class TFNetworkLayer.WindowLayer(window_size, axis='T', padding='same', **kwargs)[source]

Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. Will always return as batch major mode. E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}.

Parameters:
  • window_size (int) –
  • axis (str|int) – see Data.get_axis_from_description()
  • padding (str) – “same” or “valid”
  • kwargs
classmethod get_out_data_from_opts(window_size, axis='T', sources=(), **kwargs)[source]
layer_class = 'window'[source]
recurrent = True[source]
TFNetworkLayer.concat_sources(src_layers)[source]
Parameters:src_layers (list[LayerBase]) –
Returns:data with placeholders set
Return type:Data
TFNetworkLayer.concat_sources_with_opt_dropout(src_layers, dropout=0)[source]
Parameters:
  • src_layers (list[LayerBase]) –
  • dropout (float) – will be applied if train_flag is set
Returns:

data with placeholders set

Return type:

Data

TFNetworkLayer.get_concat_sources_data_template(src_layers, name=None)[source]
Parameters:
  • src_layers (list[LayerBase]) –
  • name (str|None) – name of the Data
Returns:

data with no placeholders set

Return type:

Data

TFNetworkLayer.get_layer_class(name)[source]
Parameters:name (str) – matches layer_class
Return type:(() -> LayerBase) | type[LayerBase] | LayerBase
TFNetworkLayer.get_layer_class_name_list()[source]
TFNetworkLayer.get_loss_class(loss)[source]
Parameters:loss (str) – loss type such as “ce”
Return type:(() -> Loss) | type[Loss] | Loss