Softmax Layers

Softmax Layer

class TFNetworkLayer.SoftmaxLayer(activation='softmax', **kwargs)[source]

Just a LinearLayer with activation=”softmax” by default.

layer_class = 'softmax'[source]

Batched Softmax Layer

class TFNetworkLayer.BatchSoftmaxLayer(**kwargs)[source]

Softmax over spacial and feature axis

layer_class = 'batch_softmax'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

Softmax-Over-Spatial Layer

class TFNetworkLayer.SoftmaxOverSpatialLayer(axis=None, energy_factor=None, start=None, window_start=None, window_size=None, use_time_mask=None, **kwargs)[source]

This applies a softmax over spatial axis/axes (currently only time axis supported). E.g. when the input is of shape (B,T,dim), the output will be (B,T,dim). It automatically masks the frames outside the seq defined by the seq-len. In contrast to SoftmaxLayer, this will not do a linear transformation. See SeqLenMaskLayer if you just want to apply a masking.

Parameters:
  • axis (str|None) – which axis to do the softmax over
  • energy_factor (float|None) – the energy will be scaled by this factor. This is like a temperature for the softmax. In Attention-is-all-you-need, this is set to 1/sqrt(base_ctx.dim).
  • start (LayerBase|None) – Tensor of shape (B,) indicating the start frame
  • window_start (LayerBase|None) – Tensor of shape (B,) indicating the window start
  • window_size (LayerBase|int|None) –
  • use_time_mask (bool) – if True, assumes dyn seq len, and use it for masking. By default, if dyn seq len exists, it uses it.
layer_class = 'softmax_over_spatial'[source]
get_dep_layers(self)[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources, axis=None, start=None, window_start=None, window_size=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str|None) –
  • start (LayerBase|None) –
  • window_start (LayerBase|None) –
  • window_size (LayerBase|int|None) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters: