returnn.frontend.loss¶
Loss functions
- returnn.frontend.loss.cross_entropy(*, estimated: Tensor, target: Tensor, axis: Dim, estimated_type: str) Tensor[source]¶
targetis supposed to be in probability space (normalized). It can also be sparse, i.e. contain class indices.estimatedcan be probs, log-probs or logits, specified viaestimated_type.Assuming both are in probability space, the cross entropy is:
- H(target,estimated) = -reduce_sum(target * log(estimated), axis=axis)
= -matmul(target, log(estimated), reduce=axis)
In case you want label smoothing, you can use e.g.:
ce = nn.cross_entropy( target=nn.label_smoothing(target, 0.1), estimated=estimated)
- Parameters:
estimated – probs, log-probs or logits, specified via
estimated_typetarget – probs, normalized, can also be sparse
axis – class labels dim over which softmax is computed
estimated_type – “probs”, “log-probs” or “logits”
- Returns:
cross entropy (same Dims as ‘estimated’ but without ‘axis’)
- returnn.frontend.loss.ctc_loss(*, logits: Tensor, logits_normalized: bool = False, targets: Tensor, input_spatial_dim: Dim, targets_spatial_dim: Dim, blank_index: int, max_approx: bool = False, use_native_op: bool | None = None, label_loop: bool = True) Tensor[source]¶
Calculates the CTC loss.
Internally, this uses
returnn.tf.native_op.ctc_loss()which is equivalent to tf.nn.ctc_loss but more efficient.Output is of shape [B].
- Parameters:
logits – (before softmax). shape [B…,input_spatial,C]
logits_normalized – whether the logits are already normalized (e.g. via log-softmax)
targets – sparse. shape [B…,targets_spatial] -> C
input_spatial_dim – spatial dim of input logits
targets_spatial_dim – spatial dim of targets
blank_index – vocab index of the blank symbol
max_approx – if True, use max instead of sum over alignments (max approx, Viterbi)
use_native_op – whether to use our native op
label_loop
- Returns:
loss shape [B…]
- returnn.frontend.loss.ctc_best_path(*, logits: Tensor, logits_normalized: bool = False, targets: Tensor, input_spatial_dim: Dim, targets_spatial_dim: Dim, blank_index: int, label_loop: bool = True) Tensor[source]¶
Calculates the CTC best path.
- Parameters:
logits – (before softmax). shape [B…,input_spatial,C]
logits_normalized – whether the logits are already normalized (e.g. via log-softmax)
targets – sparse. shape [B…,targets_spatial] -> C
input_spatial_dim – spatial dim of input logits
targets_spatial_dim – spatial dim of targets
blank_index – vocab index of the blank symbol
label_loop – whether label loops are allowed (standard for CTC). False is like RNA topology.
- Returns:
best path, shape [B…,targets_spatial] -> C
- returnn.frontend.loss.ctc_greedy_decode(logits: Tensor, *, in_spatial_dim: Dim, blank_index: int, out_spatial_dim: Dim | None = None, target_dim: Dim | None = None, wb_target_dim: Dim | None = None) Tuple[Tensor, Dim][source]¶
Greedy CTC decode.
- Returns:
(labels, out_spatial_dim)
- returnn.frontend.loss.ctc_durations_from_path(*, path: Tensor, path_spatial_dim: Dim, blank_index: int, targets_spatial_dim: Dim | None = None, out_spatial_dim: Dim | None = None, check_dims: bool = True, stop_on_failed_check: bool = True) Tuple[Tensor, Dim][source]¶
Given a CTC path (alignment), compute the durations of each label + blanks. Specifically, assuming that we have N labels in the target sequence, there are N labels and N+1 blank durations, (one before the first label, one after the last label, and one between each pair of labels), resulting in a total of 2N+1 durations. The returned durations tensor will have shape [B,…,T’] where T’ = 2 * N + 1, corresponding to durations for state sequence [blank_0, label_1, blank_1, label_2, …, label_N, blank_N].
- Parameters:
path – CTC path (alignment), shape [B…,path_spatial_dim] -> label indices (including blanks)
path_spatial_dim – spatial dim of path
blank_index – index of the blank label
targets_spatial_dim – if given, asserts that the computed number of labels matches this size
out_spatial_dim – if given, asserts that the output spatial dim size matches 2 * target_spatial_dim + 1
check_dims – whether to check the dimensions sizes
stop_on_failed_check – whether to raise an error on failed check
- Returns:
(durations, out_spatial_dim). durations shape [B…,out_spatial_dim] where out_spatial_dim = 2 * N + 1, where N is the number of labels in the target sequence.
- returnn.frontend.loss.ctc_no_label_loop_blank_durations_from_path(*, path: Tensor, path_spatial_dim: Dim, blank_index: int, targets_spatial_dim: Dim | None = None, out_spatial_dim: Dim | None = None, check_dims: bool = True, stop_on_failed_check: bool = True) Tuple[Tensor, Dim][source]¶
Given a CTC-without-label-loop (
label_loop=Falseinctc_best_path()) (RNA) path (alignment), compute the durations of all the blanks. Specifically, assuming that we have N labels in the target sequence, there are N+1 blank durations (one before the first label, one after the last label, and one between each pair of labels).- Parameters:
path – CTC path (alignment), shape [B…,path_spatial_dim] -> label indices (including blanks)
path_spatial_dim – spatial dim of path
blank_index – index of the blank label
targets_spatial_dim – if given, asserts that the computed number of labels matches this size
out_spatial_dim – if given, asserts that the output spatial dim size matches target_spatial_dim + 1
check_dims – whether to check the dimensions sizes
stop_on_failed_check – whether to raise an error on failed check
- Returns:
(durations, out_spatial_dim), durations is for the blank labels, durations shape [B…,out_spatial_dim] where out_spatial_dim = N + 1, where N is the number of labels in the target sequence.