returnn.frontend.decoder.transformer

(Label-sync) Transformer decoder, including cross attention to encoder

References:

class returnn.frontend.decoder.transformer.TransformerDecoder(encoder_dim: ~returnn.tensor.dim.Dim | None, vocab_dim: ~returnn.tensor.dim.Dim, model_dim: ~returnn.tensor.dim.Dim = Dim{'transformer-dec-default-model-dim'(512)}, *, num_layers: int, ff_dim: ~returnn.tensor.dim.Dim = <class 'returnn.util.basic.NotSpecified'>, ff_activation: ~typing.Callable[[~returnn.tensor.tensor.Tensor], ~returnn.tensor.tensor.Tensor] = <function relu>, dropout: float = 0.1, num_heads: int = 8, att_dropout: float = 0.1, decoder_layer: ~returnn.frontend.decoder.transformer.TransformerDecoderLayer | ~returnn.frontend.module.Module | type | ~typing.Any | None = None, decoder_layer_opts: ~typing.Dict[str, ~typing.Any] | None = None, embed_dim: ~returnn.tensor.dim.Dim | None = None, share_embedding: bool | None = None, input_embedding_scale: float | None = None, input_dropout: float | None = None, logits_with_bias: bool = False, sequential=<class 'returnn.frontend.container.Sequential'>)[source]

Represents Transformer decoder architecture

Parameters:
  • encoder_dim – for cross-attention. None if no cross-attention.

  • vocab_dim

  • model_dim – the output feature dimension

  • num_layers – the number of encoder layers

  • ff_dim – the dimension of feed-forward layers. 2048 originally, or 4 times out_dim

  • ff_activation – activation function for feed-forward network

  • dropout – the dropout value for the FF block

  • num_heads – the number of attention heads

  • att_dropout – attention dropout value

  • decoder_layer – an instance of TransformerDecoderLayer or similar

  • decoder_layer_opts – options for the encoder layer

  • embed_dim – if given, will first have an embedding [vocab,embed] and then a linear [embed,model].

  • share_embedding

  • input_embedding_scale

  • input_dropout

  • logits_with_bias

  • sequential

default_initial_state(*, batch_dims: Sequence[Dim]) State[source]

default initial state

transform_encoder(encoder: Tensor, *, axis: Dim) State[source]

Transform encoder output. Note that the Transformer decoder usually expects that layer-norm was applied already on the encoder output.

class returnn.frontend.decoder.transformer.TransformerDecoderLayer(encoder_dim: ~returnn.tensor.dim.Dim | None, out_dim: ~returnn.tensor.dim.Dim = Dim{'transformer-dec-default-out-dim'(512)}, *, ff_dim: ~returnn.tensor.dim.Dim = <class 'returnn.util.basic.NotSpecified'>, ff_activation: ~typing.Callable[[~returnn.tensor.tensor.Tensor], ~returnn.tensor.tensor.Tensor] = <function relu>, dropout: float = 0.1, num_heads: int = 8, self_att: ~returnn.frontend.attention.CausalSelfAttention | ~returnn.frontend.attention.RelPosCausalSelfAttention | ~returnn.frontend.module.Module | type | ~typing.Any | None = None, self_att_opts: ~typing.Dict[str, ~typing.Any] | None = None, att_dropout: float = 0.1)[source]

Represents a conformer block

Parameters:
  • encoder_dim – for cross-attention. None if no cross-attention.

  • out_dim – the output feature dimension

  • ff_dim – the dimension of feed-forward layers. 2048 originally, or 4 times out_dim

  • ff_activation – activation function for feed-forward network

  • dropout – the dropout value for the FF block

  • num_heads – the number of attention heads

  • self_att – the self-attention layer. RelPosSelfAttention originally and default

  • self_att_opts – options for the self-attention layer, for nn.RelPosSelfAttention

  • att_dropout – attention dropout value

default_initial_state(*, batch_dims: Sequence[Dim]) State[source]

default initial state

transform_encoder(encoder: Tensor, *, axis: Dim) State[source]

Transform the encoder output.

class returnn.frontend.decoder.transformer.FeedForward(out_dim: ~returnn.tensor.dim.Dim, *, ff_dim: ~returnn.tensor.dim.Dim | None = <class 'returnn.util.basic.NotSpecified'>, dropout: float, activation: ~typing.Callable[[~returnn.tensor.tensor.Tensor], ~returnn.tensor.tensor.Tensor])[source]
Transformer position-wise feedforward neural network layer

FF -> Activation -> Dropout -> FF

Parameters:
  • out_dim – output feature dimension

  • ff_dim – dimension of the feed-forward layers

  • dropout – dropout value

  • activation – activation function