Loss Functions

As-Is Loss

class returnn.tf.layers.basic.AsIsLoss(**kwargs)[source]

Use the output as-is as the loss.

class_name = 'as_is'[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:None
classmethod get_default_target(extern_data)[source]
Parameters:extern_data (TFNetwork.ExternData) –
Return type:None

Binary Cross-Entropy Loss

class returnn.tf.layers.basic.BinaryCrossEntropyLoss(pos_weight=None, **kwargs)[source]

Binary cross entropy. We expect the output as logits, not in probability space! Per frame: mean(target * log(sigmoid(output)) + (1 - target) * log(1 - sigmoid(output)))

Parameters:pos_weight (float|None) – weight of positive labels, see tf.nn.weighted_cross_entropy_with_logits.
class_name = 'bin_ce'[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Returns:frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)
Return type:tf.Tensor

Bleu Loss

class returnn.tf.layers.basic.BleuLoss(**kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics. Also, BLEU is a score, i.e. the higher, the better. Thus, to interpret it as a loss or error, we take the negative value.

class_name = 'bleu'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_error()[source]
Return type:tf.Tensor
get_value()[source]
Return type:None

Cross-Entropy Loss

class returnn.tf.layers.basic.CrossEntropyLoss(focal_loss_factor=0.0, label_smoothing=0.0, label_smoothing_gaussian=False, debug_dump=False, safe_log_opts=None, use_fused=True, fake_upper_bound=None, **kwargs)[source]

Cross-Entropy loss. Basically sum(target * log(output)).

Parameters:
  • focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled
  • label_smoothing (float) – 0.1 is a common default. see TFUtil.smoothing_cross_entropy()
  • label_smoothing_gaussian (bool) – see TFUtil.smoothing_cross_entropy()
  • debug_dump (bool) –
  • safe_log_opts (dict[str]) – passed to safe_log()
  • use_fused (bool) – if possible, use fused opts
  • fake_upper_bound (float|None) – uses TFUtil.minimum_with_identity_grad(). I.e. you will see a finite loss, but we use the original gradient (which should be safe).
class_name = 'ce'[source]
get_output_target_scores()[source]
Returns:shape (time_flat,), type float32
Return type:tf.Tensor
get_value()[source]
Return type:tf.Tensor

CTC Loss

class returnn.tf.layers.basic.CtcLoss(target_collapse_repeated=False, auto_clip_target_len=False, output_in_log_space=False, beam_width=100, ctc_opts=None, focal_loss_factor=0.0, use_native=False, use_viterbi=False, **kwargs)[source]

Connectionist Temporal Classification (CTC) loss. Basically a wrapper around tf.nn.ctc_loss.

Parameters:
  • target_collapse_repeated (bool) – like preprocess_collapse_repeated option for CTC. used for sparse_labels().
  • auto_clip_target_len (bool) – see self._get_target_sparse_labels().
  • output_in_log_space (bool) – False -> output expected in prob space. see self.get_output_logits
  • beam_width (int) – used in eval
  • ctc_opts (dict[str]|None) – other kwargs used for tf.nn.ctc_loss
  • focal_loss_factor (float) – see https://arxiv.org/abs/1708.02002. 0 means disabled. generalized for CTC
  • use_native (bool) – use our native implementation (TFNativeOp.ctc_loss())
  • use_viterbi (bool) – instead of full-sum, use only best path (via ctc_loss_viterbi())
class_name = 'ctc'[source]
recurrent = True[source]
init(**kwargs)[source]

See super.

get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_soft_alignment()[source]

Also called the Baum-Welch-alignment. This is basically p_t(s|x_1^T,w_1^N), where s are the output labels (including blank), and w are the real target labels.

Returns:shape (time, batch, dim)
Return type:tf.Tensor
get_focal_loss_factor()[source]
Returns:shape (time, batch, dim)
Return type:tf.Tensor
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor
classmethod get_auto_output_layer_dim(target_dim)[source]
Return type:int

Deep Clustering Loss

class returnn.tf.layers.basic.DeepClusteringLoss(embedding_dimension, nr_of_sources, **kwargs)[source]

Cost function used for deep clustering as described in [Hershey & Chen+, 2016]: “Deep clustering discriminative embeddings for segmentation and separation”

Parameters:
  • embedding_dimension (int) –
  • nr_of_sources (int) –
class_name = 'deep_clustering'[source]
get_error()[source]
Returns:frame error rate as a scalar value
Return type:tf.Tensor | None
get_value()[source]
Return type:tf.Tensor

Edit Distance Loss

class returnn.tf.layers.basic.EditDistanceLoss(debug_print=False, label_map=None, ctc_decode=False, output_in_log_space=False, **kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics.

Parameters:
  • debug_print (bool) – will tf.Print the sequence
  • label_map (dict[int,int]|None) – before calculating the edit-distance, will apply this map
  • ctc_decode (bool) – True -> expects dense output and does CTC decode, False -> expects sparse labels in output
  • output_in_log_space (bool) – False -> dense output expected in prob space. see self.get_output_logits
class_name = 'edit_distance'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_output_logits()[source]
Returns:outputs in log-space / logits
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor
get_value()[source]
Return type:None

Expected Loss

class returnn.tf.layers.basic.ExpectedLoss(loss, loss_kind, norm_scores=True, norm_scores_stop_gradient=True, divide_beam_size=True, subtract_average_loss=True, loss_correction_grad_only=False, **kwargs)[source]

This loss uses another loss error or value and given the search beam scores, calculates the expected loss. Sometimes also called minimum Bayes risk.

Parameters:
  • loss (Loss) –
  • loss_kind (str) – “error” or “value”. whether to use loss.get_error() or loss.get_value()
  • norm_scores (bool) –
  • norm_scores_stop_gradient (bool) –
  • divide_beam_size (bool) –
  • subtract_average_loss (bool) –
  • loss_correction_grad_only (bool) –
class_name = 'expected_loss'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
init(**kwargs)[source]

Overwrites super. Get search choices.

get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:None

Extern Sprint Loss

class returnn.tf.layers.basic.ExternSprintLoss(sprint_opts, **kwargs)[source]

The loss is calculated by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'sprint'[source]
recurrent = True[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None

Fast Baum Welch Loss

class returnn.tf.layers.basic.FastBaumWelchLoss(sprint_opts, **kwargs)[source]

The loss is calculated via fast_baum_welch(). The automata are created by an extern Sprint instance.

Parameters:sprint_opts (dict[str]) –
class_name = 'fast_bw'[source]
recurrent = True[source]
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None

Generic Cross-Entropy Loss

class returnn.tf.layers.basic.GenericCELoss(**kwargs)[source]

Some generalization of cross entropy.

class_name = 'generic_ce'[source]
get_value()[source]
Return type:tf.Tensor

Mean-L1 Loss

class returnn.tf.layers.basic.MeanSquaredError(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

The generic mean squared error loss function

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'mse'[source]
get_value()[source]
Return type:tf.Tensor

Mean-Squared-Error Loss

class returnn.tf.layers.basic.MeanSquaredError(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

The generic mean squared error loss function

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'mse'[source]
get_value()[source]
Return type:tf.Tensor

L1 Loss

class returnn.tf.layers.basic.L1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, scale=1.0)[source]

L1-distance loss. sum(target - output).

Parameters:
  • base_network (returnn.tf.network.TFNetwork) –
  • use_flatten_frames (bool) – will use TFUtil.flatten_with_seq_len_mask()
  • use_normalized_loss (bool) – the loss used in optimization will be normalized
  • custom_norm_factor (float|function|None) –
  • scale (float) – additional scale factor for the loss
class_name = 'l1'[source]
get_value()[source]
Return type:tf.Tensor

Sampling-Based Loss

class returnn.tf.layers.basic.SamplingBasedLoss(num_sampled=128, num_splits=1, sampler='log_uniform', nce_loss=False, use_full_softmax=False, remove_accidental_hits=None, sampler_args=None, nce_log_norm_term=0.0, **kwargs)[source]

Implement two sampling based losses, sampled softmax (default) and noise contrastive estimation. https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss. https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss.

Must be used in an output linear layer with a weight matrix of shape (num_classes, dim). When using ‘log_uniform’ sampler (default), optimal performance is typically achieved with the vocabulary list sorted in decreasing order of frequency (https://www.tensorflow.org/api_docs/python/tf/random/log_uniform_candidate_sampler).

Parameters:
  • num_sampled (int) – Number of classes to be sampled. For sampled softmax, this is the number of classes to be used to estimate the sampled softmax. For noise contrastive estimation, this is the number of noise samples.
  • num_splits (int) – Number of different samples (each with ‘num_sampled’ classes) to be used per batch.
  • sampler (str) – Specify sampling distribution (“uniform”, “log_uniform”, “learned_unigram” or “fixed_unigram”).
  • nce_loss (bool) – If True, use noise contrastive estimation loss. Else (default), use the sampled softmax.
  • use_full_softmax (bool) – If True, compute the full softmax instead of sampling (can be used for evaluation).
  • remove_accidental_hits (bool|None) – If True, remove sampled classes that equal one of the target classes. If not specified (None), the value is determined based on the choosen objective. For sampled softmax this should be set to True; for NCE the default is False. Set this to True in case of NCE training and the objective is equal to sampled logistic loss.
  • sampler_args (dict[str]) – additional arguments for the candidate sampler. This is most relevant to the fixed_unigram sampler. See https://www.tensorflow.org/api_docs/python/tf/random/fixed_unigram_candidate_sampler for details.
  • nce_log_norm_term (float) – The logarithm of the constant normalization term for NCE.
class_name = 'sampling_loss'[source]
get_value()[source]
Return type:tf.Tensor

Triplet Loss

class returnn.tf.layers.basic.TripletLoss(margin, multi_view_training=False, **kwargs)[source]

Triplet loss: loss = max(margin + d(x_a, x_s) - d(x_a, x_d), 0.0) Triplet loss is used for metric learning in a siamese/triplet network. It should be used as a part of CopyLayer with 3 inputs corresponding to

x_a, x_s and x_d in a loss.
Here we assume that x_a are anchor samples, x_s are samples where
at each position i in a minibatch x_ai and x_si belong to the same class, while pairs x_ai and x_di belong to different classes.

In this implementation the number of training examples is increased by extracting all possible same/different pairs within a minibatch.

class_name = 'triplet_loss'[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
get_value()[source]
Return type:tf.Tensor
get_error()[source]

Error is not defined for triplet_loss :return: None

Via Layer Loss

class returnn.tf.layers.basic.ViaLayerLoss(error_signal_layer=None, align_layer=None, loss_wrt_to_act_in=False, **kwargs)[source]

The loss error signal and loss value is defined as the output of another layer. That way, you can define any custom loss. This could e.g. be used together with the fast_bw layer.

Parameters:
  • error_signal_layer (LayerBase) –
  • align_layer (LayerBase) –
  • loss_wrt_to_act_in (bool|str) – if True, we expect that the given output_with_activation is set, and the given error signal is w.r.t. the input of the specific activation function. A common example is the input to the softmax function, where the gradient is much more stable to define, e.g. y - z instead of y/z for cross entropy. If you specify a str, e.g. “softmax” or “log_softmax”, there is an additional check that the used activation function is really that one.
class_name = 'via_layer'[source]
recurrent = True[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts
  • network (returnn.tf.network.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
get_value()[source]
Return type:tf.Tensor
get_error()[source]
Return type:tf.Tensor|None
classmethod get_default_target(extern_data)[source]
Parameters:extern_data (TFNetwork.ExternData) –
Return type:None