Loss Functions

This is a list of all loss functions that can be used by adding "loss": "<class_name_of_loss>" to a layer. Additional input parameters to the respective loss classes can be given via loss_opts. A scale for a loss can be set via loss_scale (also see Defining Layers)

If the output of a loss function is needed as a part of the network, the LossLayer can be used in combination with one of the losses.

LossLayer

class returnn.tf.layers.basic.LossLayer(loss_, target_=None, use_error=False, **kwargs)[source]

This layers wraps a Loss calculation as a layer. I.e. the loss will be calculated and returned by the layer. But this loss will not be used as a loss by the updater. If you want to use it as a loss, you can use the AsIsLoss, i.e. write "loss": "as_is".

Note that the loss options for the wrapped loss need to be provided via loss_opts_, and it does not apply any reduce function.

Note

The LossLayer might be deprecated in the future in favor of implementing the losses as actual layers.

If you want to define a loss inside the network, it is recommended to define it explicitly. An example could be:

"se_loss": {"class": "eval", "eval": "(source(0) - source(1)) ** 2", "from": ["output", "data:classes"]}

Followed by an e.g. mean reduce if needed:

"mse_loss": {"class": "reduce", "mode": "mean", "axis": "F", "from": "se_loss"}

loss_ and related params have the postfix _ to distinguish them from the loss options, which are used by the network and updater for training. Some of these (e.g. loss_opts_) are handled in transform_config_dict().

Parameters:
  • loss (Loss)

  • target (LayerBase|None)

  • use_error (bool) – whether to output the loss error instead of the loss value

layer_class: Optional[str] = 'loss'[source]
recurrent = True[source]
get_sub_layer(layer_name)[source]
Parameters:

layer_name (str) – sub layer name

Return type:

LayerBase|None

classmethod get_available_sub_layer_names(parent_layer_kwargs)[source]
Parameters:

parent_layer_kwargs (dict[str])

Return type:

list[str]

classmethod get_sub_layer_out_data_from_opts(layer_name, parent_layer_kwargs)[source]
Parameters:
  • layer_name (str) – sub layer name

  • parent_layer_kwargs (dict[str])

Returns:

Data template, class type of sub-layer, layer opts (transformed)

Return type:

(Data, type, dict[str])|None

get_dep_layers()[source]
Return type:

list[LayerBase]

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
classmethod get_out_data_from_opts(name, sources, target_=None, **kwargs)[source]
Parameters:
Return type:

Data

kwargs: Optional[Dict[str]][source]
output_before_activation: Optional[OutputWithActivation][source]
output_loss: Optional[tf.Tensor][source]
rec_vars_outputs: Dict[str, tf.Tensor][source]
search_choices: Optional[SearchChoices][source]
params: Dict[str, tf.Variable][source]
saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]
stats: Dict[str, tf.Tensor][source]

As-Is Loss

class returnn.tf.layers.basic.AsIsLoss(as_error=False, **kwargs)[source]

Use the output as-is as the loss.

Also see ViaLayerLoss which also allows to define a custom error signal (gradient).

Parameters:

as_error (bool) – if True, use the output as error, otherwise (default) use the output as loss value. Error is purely for reporting, loss value is used for the optimizer as well (when scale != 0).

class_name: str = 'as_is'[source]
need_target = False[source]
get_value()[source]
Return type:

tf.Tensor|None

get_error()[source]
Return type:

tf.Tensor|None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Binary Cross-Entropy Loss

class returnn.tf.layers.basic.BinaryCrossEntropyLoss(pos_weight=None, **kwargs)[source]

Binary cross entropy. We expect the output as logits, not in probability space! Per frame: mean(target * log(sigmoid(output)) + (1 - target) * log(1 - sigmoid(output)))

Parameters:

pos_weight (float|None) – weight of positive labels, see tf.nn.weighted_cross_entropy_with_logits.

class_name: str = 'bin_ce'[source]
get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Returns:

frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)

Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Bleu Loss

class returnn.tf.layers.basic.BleuLoss(**kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics. Also, BLEU is a score, i.e. the higher, the better. Thus, to interpret it as a loss or error, we take the negative value.

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = 'bleu'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output

  • output_with_activation (OutputWithActivation|None)

  • target (Data) – reference target from dataset

get_error()[source]
Return type:

tf.Tensor

get_value()[source]
Return type:

None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Cross-Entropy Loss

class returnn.tf.layers.basic.CrossEntropyLoss(input_type='prob', focal_loss_factor=0.0, label_smoothing=0.0, label_smoothing_gaussian=False, debug_dump=False, safe_log_opts=None, use_fused=True, fake_upper_bound=None, **kwargs)[source]

Cross-Entropy loss. Basically sum(target * log(output)).

Parameters:
class_name: str = 'ce'[source]
need_target = True[source]
get_output_target_scores()[source]
Returns:

shape (time_flat,), type float32, std-prob space

Return type:

tf.Tensor

get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

CTC Loss

class returnn.tf.layers.basic.CtcLoss(target_collapse_repeated=False, auto_clip_target_len=False, output_in_log_space=False, beam_width=100, ctc_opts=None, use_native=False, use_viterbi=False, **kwargs)[source]

Connectionist Temporal Classification (CTC) loss. Basically a wrapper around tf.nn.ctc_loss.

Parameters:
  • target_collapse_repeated (bool) – like preprocess_collapse_repeated option for CTC. used for sparse_labels().

  • auto_clip_target_len (bool) – see self._get_target_sparse_labels().

  • output_in_log_space (bool) – False -> output expected in prob space. see self.get_output_logits

  • beam_width (int) – used in eval

  • ctc_opts (dict[str]|None) – other kwargs used for tf.nn.ctc_loss

  • use_native (bool) – use our native implementation (TFNativeOp.ctc_loss())

  • use_viterbi (bool) – instead of full-sum, use only best path (via ctc_loss_viterbi())

class_name: str = 'ctc'[source]
recurrent = True[source]
init(**kwargs)[source]

See super.

get_output_logits()[source]
Returns:

outputs in log-space / logits

Return type:

tf.Tensor

get_soft_alignment()[source]

Also called the Baum-Welch-alignment. This is basically p_t(s|x_1^T,w_1^N), where s are the output labels (including blank), and w are the real target labels.

Returns:

shape (time, batch, dim)

Return type:

tf.Tensor

get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Return type:

tf.Tensor

classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters:

target_dim (returnn.tensor.Dim)

Return type:

returnn.tensor.Dim

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Deep Clustering Loss

class returnn.tf.layers.basic.DeepClusteringLoss(embedding_dimension, nr_of_sources, **kwargs)[source]

Cost function used for deep clustering as described in [Hershey & Chen+, 2016]: “Deep clustering discriminative embeddings for segmentation and separation”

Parameters:
  • embedding_dimension (int)

  • nr_of_sources (int)

class_name: str = 'deep_clustering'[source]
get_error()[source]
Returns:

frame error rate as a scalar value

Return type:

tf.Tensor | None

get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Edit Distance Loss

class returnn.tf.layers.basic.EditDistanceLoss(debug_print=False, label_map=None, ctc_decode=False, output_in_log_space=False, **kwargs)[source]

Note that this loss is not differentiable, thus it’s only for keeping statistics.

Parameters:
  • debug_print (bool) – will tf.Print the sequence

  • label_map (dict[int,int]|None) – before calculating the edit-distance, will apply this map

  • ctc_decode (bool) – True -> expects dense output and does CTC decode, False -> expects sparse labels in output

  • output_in_log_space (bool) – False -> dense output expected in prob space. see self.get_output_logits

class_name: str = 'edit_distance'[source]
recurrent = True[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output

  • output_with_activation (OutputWithActivation|None)

  • target (Data) – reference target from dataset

get_output_logits()[source]
Returns:

outputs in log-space / logits

Return type:

tf.Tensor

get_error()[source]
Return type:

tf.Tensor

get_value()[source]
Return type:

None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Expected Loss

class returnn.tf.layers.basic.ExpectedLoss(loss, loss_kind, norm_scores=True, norm_scores_stop_gradient=True, divide_beam_size=True, subtract_average_loss=True, loss_correction_grad_only=False, **kwargs)[source]

This loss uses another loss error or value and given the search beam scores, calculates the expected loss. Sometimes also called minimum Bayes risk.

Parameters:
  • loss (Loss)

  • loss_kind (str) – “error” or “value”. whether to use loss.get_error() or loss.get_value()

  • norm_scores (bool)

  • norm_scores_stop_gradient (bool)

  • divide_beam_size (bool)

  • subtract_average_loss (bool)

  • loss_correction_grad_only (bool)

class_name: str = 'expected_loss'[source]
recurrent = True[source]
search_choices: SearchChoices | None[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
init(**kwargs)[source]

Overwrites super. Get search choices.

get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Return type:

None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Extern Sprint Loss

class returnn.tf.layers.basic.ExternSprintLoss(sprint_opts, **kwargs)[source]

The loss is calculated by an extern Sprint instance.

Parameters:

sprint_opts (dict[str])

class_name: str = 'sprint'[source]
recurrent = True[source]
need_target = False[source]
get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Return type:

tf.Tensor|None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Fast Baum Welch Loss

class returnn.tf.layers.basic.FastBaumWelchLoss(sprint_opts, tdp_scale=1.0, **kwargs)[source]

The loss is calculated via fast_baum_welch(). The automata are created by an extern Sprint instance.

Parameters:

sprint_opts (dict[str])

class_name: str = 'fast_bw'[source]
recurrent = True[source]
need_target = False[source]
get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Return type:

tf.Tensor|None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Generic Cross-Entropy Loss

class returnn.tf.layers.basic.GenericCELoss(**kwargs)[source]

Some generalization of cross entropy.

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = 'generic_ce'[source]
get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Mean-L1 Loss

class returnn.tf.layers.basic.MeanL1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, custom_inv_norm_factor=None, scale=1.0, _check_output_before_softmax=None)[source]

Like MSE loss, but with absolute difference

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = 'mean_l1'[source]
get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Mean-Squared-Error Loss

class returnn.tf.layers.basic.MeanSquaredError(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, custom_inv_norm_factor=None, scale=1.0, _check_output_before_softmax=None)[source]

The generic mean squared error loss function

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = 'mse'[source]
get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

L1 Loss

class returnn.tf.layers.basic.L1Loss(base_network, use_flatten_frames=True, use_normalized_loss=False, custom_norm_factor=None, custom_inv_norm_factor=None, scale=1.0, _check_output_before_softmax=None)[source]

L1-distance loss. sum(target - output).

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

class_name: str = 'l1'[source]
get_value()[source]
Return type:

tf.Tensor

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]

Sampling-Based Loss

class returnn.tf.layers.basic.SamplingBasedLoss(num_sampled=128, num_splits=1, sampler='log_uniform', nce_loss=False, use_full_softmax=False, remove_accidental_hits=None, sampler_args=None, nce_log_norm_term=0.0, **kwargs)[source]

Implement two sampling based losses, sampled softmax (default) and noise contrastive estimation. https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss. https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss.

Must be used in an output linear layer with a weight matrix of shape (num_classes, dim). When using ‘log_uniform’ sampler (default), optimal performance is typically achieved with the vocabulary list sorted in decreasing order of frequency (https://www.tensorflow.org/api_docs/python/tf/random/log_uniform_candidate_sampler).

Parameters:
  • num_sampled (int) – Number of classes to be sampled. For sampled softmax, this is the number of classes to be used to estimate the sampled softmax. For noise contrastive estimation, this is the number of noise samples.

  • num_splits (int) – Number of different samples (each with ‘num_sampled’ classes) to be used per batch.

  • sampler (str) – Specify sampling distribution (“uniform”, “log_uniform”, “learned_unigram” or “fixed_unigram”).

  • nce_loss (bool) – If True, use noise contrastive estimation loss. Else (default), use the sampled softmax.

  • use_full_softmax (bool) – If True, compute the full softmax instead of sampling (can be used for evaluation).

  • remove_accidental_hits (bool|None) – If True, remove sampled classes that equal one of the target classes. If not specified (None), the value is determined based on the choosen objective. For sampled softmax this should be set to True; for NCE the default is False. Set this to True in case of NCE training and the objective is equal to sampled logistic loss.

  • sampler_args (dict[str]) – additional arguments for the candidate sampler. This is most relevant to the fixed_unigram sampler. See https://www.tensorflow.org/api_docs/python/tf/random/fixed_unigram_candidate_sampler for details.

  • nce_log_norm_term (float) – The logarithm of the constant normalization term for NCE.

class_name: str = 'sampling_loss'[source]
layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]
get_value()[source]
Return type:

tf.Tensor

Triplet Loss

class returnn.tf.layers.basic.TripletLoss(margin, multi_view_training=False, **kwargs)[source]

Triplet loss: loss = max(margin + d(x_a, x_s) - d(x_a, x_d), 0.0) Triplet loss is used for metric learning in a siamese/triplet network. It should be used as a part of CopyLayer with 3 inputs corresponding to

x_a, x_s and x_d in a loss.

Here we assume that x_a are anchor samples, x_s are samples where

at each position i in a minibatch x_ai and x_si belong to the same class, while pairs x_ai and x_di belong to different classes.

In this implementation the number of training examples is increased by extracting all possible same/different pairs within a minibatch.

Parameters:
  • base_network (returnn.tf.network.TFNetwork)

  • use_flatten_frames (bool) – will use returnn.tf.util.basic.flatten_with_seq_len_mask()

  • use_normalized_loss (bool) – the loss used in optimization will be normalized

  • custom_norm_factor (float|function|None) – The standard norm factor is 1/sum(target_seq_len) if the target has a time-axis, or 1/sum(output_seq_len) if there is no target and the output has a time-axis, or 1 otherwise. (See Loss.init() for details.) This is used for proper normalization of accumulated loss/error per epoch and also proper normalization per batch for reporting, no matter if use_normalized_loss is True or False. If you want to change this norm factor, you can set this. As a function, it takes (self=self, output=output, layer=layer) and returns a float scalar.

  • custom_inv_norm_factor (LayerBase|None) – inverse of custom_norm_factor. Here we allow to pass a layer. Here we also allow to pass any shape and it will automatically be reduced via sum. So you could simply pass target_seq_len directly here. Basically, for all reporting, it uses sum(loss) * sum(custom_inv_norm_factor).

  • scale (float) – additional scale factor for the loss

  • _check_output_before_softmax (bool|None)

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]
class_name: str = 'triplet_loss'[source]
init(output, output_with_activation=None, target=None, **kwargs)[source]
Parameters:
  • output (Data) – generated output

  • output_with_activation (OutputWithActivation|None)

  • target (Data) – reference target from dataset

get_value()[source]
Return type:

tf.Tensor

get_error()[source]

Error is not defined for triplet_loss :return: None

Via Layer Loss

class returnn.tf.layers.basic.ViaLayerLoss(error_signal_layer=None, align_layer=None, loss_wrt_to_act_in=False, **kwargs)[source]

The loss error signal and loss value is defined as the output of another layer. That way, you can define any custom loss. This could e.g. be used together with the fast_bw layer.

This is a more custom variant of AsIsLoss, which simply takes the output of a layer as loss without redefining the error signal (gradient).

Parameters:
  • error_signal_layer (LayerBase)

  • align_layer (LayerBase)

  • loss_wrt_to_act_in (bool|str) – if True, we expect that the given output_with_activation is set, and the given error signal is w.r.t. the input of the specific activation function. A common example is the input to the softmax function, where the gradient is much more stable to define, e.g. y - z instead of y/z for cross entropy. If you specify a str, e.g. “softmax” or “log_softmax”, there is an additional check that the used activation function is really that one.

class_name: str = 'via_layer'[source]
recurrent = True[source]
need_target = False[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace, the loss_opts

  • network (returnn.tf.network.TFNetwork)

  • get_layer (((str) -> LayerBase)) – function to get or construct another layer

get_value()[source]
Return type:

tf.Tensor

get_error()[source]
Return type:

tf.Tensor|None

layer: LayerBase | None[source]
output: Tensor | None[source]
output_with_activation: OutputWithActivation | None[source]
output_seq_lens: Tensor | None[source]
target: Tensor | None[source]
target_seq_lens: Tensor | None[source]
output_flat: Tensor | None[source]
output_before_softmax_flat: Tensor | None[source]
target_flat: Tensor | None[source]
loss_norm_factor: Tensor | None[source]