Softmax Layers¶

Batched Softmax Layer¶

class returnn.tf.layers.basic.BatchSoftmaxLayer(**kwargs)[source]¶

Softmax over spacial and feature axis

Parameters:

in_dim (Dim|None)
out_shape (set[Dim|returnn.tf.util.data._MarkedDim]|tuple|list|None)
dropout (float) – 0.0 means to apply no dropout. dropout will only be applied during training
dropout_axis (Dim|str|list[Dim|str]|None)
dropout_noise_shape (dict[Dim|str|list[Dim|str]|tuple[Dim|str],int|str|None]|None) – see Data.get_bc_shape()
dropout_on_forward (bool) – apply dropout during inference
mask (str|None) – “dropout” or “unity” or None. this is obsolete and only here for historical reasons

layer_class: Optional[str] = 'batch_softmax'[source]¶

classmethod get_out_data_from_opts(name, sources, **kwargs)[source]¶

Parameters:

name (str)
sources (list[LayerBase])

Return type:

Data

input_data: Optional[Data][source]¶

kwargs: Optional[Dict[str]][source]¶

output_before_activation: Optional[OutputWithActivation][source]¶

output_loss: Optional[tf.Tensor][source]¶

rec_vars_outputs: Dict[str, tf.Tensor][source]¶

search_choices: Optional[SearchChoices][source]¶

params: Dict[str, tf.Variable][source]¶

saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]¶

stats: Dict[str, tf.Tensor][source]¶

Softmax Layer¶

class returnn.tf.layers.basic.SoftmaxLayer(**kwargs)[source]¶

Just a LinearLayer with activation=”softmax” by default.

Parameters:

activation (str|None) – e.g. “relu”, or None
with_bias (bool)
grad_filter (float|None) – if grad norm is higher than this threshold (before activation), the grad is removed
forward_weights_init (str) – see returnn.tf.util.basic.get_initializer()
recurrent_weights_init (str) – see returnn.tf.util.basic.get_initializer()
bias_init (str|float) – see returnn.tf.util.basic.get_initializer()
use_transposed_weights (bool) – If True, define the weight matrix with transposed dimensions (n_out, n_in).

layer_class: Optional[str] = 'softmax'[source]¶

input_data: Optional[Data][source]¶

kwargs: Optional[Dict[str]][source]¶

output_before_activation: Optional[OutputWithActivation][source]¶

output_loss: Optional[tf.Tensor][source]¶

rec_vars_outputs: Dict[str, tf.Tensor][source]¶

search_choices: Optional[SearchChoices][source]¶

params: Dict[str, tf.Variable][source]¶

saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]¶

stats: Dict[str, tf.Tensor][source]¶

Softmax-Over-Spatial Layer¶

class returnn.tf.layers.basic.SoftmaxOverSpatialLayer(axis=None, energy_factor=None, start=None, window_start=None, window_size=None, use_time_mask=None, log_space=False, **kwargs)[source]¶

This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,T,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to SoftmaxLayer, this will not do a linear transformation. See SeqLenMaskLayer if you just want to apply a masking.

Parameters:

axis (Dim|str|None) – which axis to do the softmax over. “T” by default
energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim).
start (LayerBase|None) – Tensor of shape (B,) indicating the start frame
window_start (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window start.
window_size (LayerBase|int|None) – Layer with output of shape (B,) or (constant) int value indicating the window size.
use_time_mask (bool) – if True, assumes dyn seq len, and use it for masking. By default, if dyn seq len exists, it uses it.
log_space (bool) – if True, returns in log space (i.e. uses log_softmax)

layer_class: Optional[str] = 'softmax_over_spatial'[source]¶

recurrent = True[source]¶

output_before_activation: Optional[OutputWithActivation][source]¶

get_dep_layers()[source]¶

Return type:: list[LayerBase]

classmethod get_out_data_from_opts(name, sources, axis=None, start=None, window_start=None, window_size=None, **kwargs)[source]¶

Parameters:

name (str)
sources (list[LayerBase])
axis (Dim|str|None)
start (LayerBase|None)
window_start (LayerBase|None)
window_size (LayerBase|int|None)

Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]¶

Parameters:

d (dict[str])
network (returnn.tf.network.TFNetwork)
get_layer

input_data: Optional[Data][source]¶

kwargs: Optional[Dict[str]][source]¶

output_loss: Optional[tf.Tensor][source]¶

rec_vars_outputs: Dict[str, tf.Tensor][source]¶

search_choices: Optional[SearchChoices][source]¶

params: Dict[str, tf.Variable][source]¶

saveable_param_replace: Dict[tf.Variable, Union['tensorflow.python.training.saver.BaseSaverBuilder.SaveableObject', None]][source]¶

stats: Dict[str, tf.Tensor][source]¶