`returnn.frontend.encoder.transformer`¶

Transformer encoder

Also see returnn.frontend.decoder.transformer.

class returnn.frontend.encoder.transformer.TransformerEncoder(vocab_dim: ~returnn.tensor.dim.Dim, model_dim: ~returnn.tensor.dim.Dim | int = Dim{'transformer-enc-default-model-dim'(512)}, *, num_layers: int, ff: type | ~typing.Dict[str, ~typing.Any] | ~returnn.frontend.module.Module = <class 'returnn.util.basic.NotSpecified'>, pos_enc: None | ~typing.Callable | ~typing.Dict[str, ~typing.Any] | ~returnn.frontend.module.Module = <function sinusoidal_positional_encoding>, dropout: float = 0.1, num_heads: int = 8, att_dropout: float = 0.1, norm: type | ~typing.Dict[str, ~typing.Any] | ~returnn.frontend.module.Module | ~typing.Callable = <class 'returnn.frontend.normalization.LayerNorm'>, layer: ~returnn.frontend.encoder.transformer.TransformerEncoderLayer | ~returnn.frontend.module.Module | type | ~typing.Dict[str, ~typing.Any] | ~typing.Any | None = None, layer_opts: ~typing.Dict[str, ~typing.Any] | None = None, embed_dim: ~returnn.tensor.dim.Dim | None = None, input_embedding: None | ~returnn.frontend.module.Module | type | ~typing.Dict[str, ~typing.Any] = <class 'returnn.frontend.linear.Embedding'>, input_embedding_scale: float | None = None, input_dropout: float | None = None, sequential=<class 'returnn.frontend.container.Sequential'>, **compat_kwargs)[source]¶

Represents the Transformer encoder architecture

Parameters:

vocab_dim
model_dim – the output feature dimension
num_layers – the number of encoder layers
ff – feed-forward / MLP block. Default is FeedForward
pos_enc – positional encoding. Default is sinusoidal positional encoding.
dropout – the dropout value for the FF block
num_heads – the number of attention heads
att_dropout – attention dropout value
norm – pre-normalization for FF and attention blocks
layer – an instance of TransformerEncoderLayer or similar
layer_opts – options for the encoder layer
embed_dim – if given, will first have an embedding [vocab,embed] and then a linear [embed,model].
input_embedding
input_embedding_scale
input_dropout
sequential

default_initial_state(*, batch_dims: Sequence[Dim]) → State[source]¶: default initial state

transform_encoder(encoder: Tensor, *, axis: Dim) → State[source]¶: Transform encoder output. Note that the Transformer decoder usually expects that layer-norm was applied already on the encoder output.

class returnn.frontend.encoder.transformer.TransformerEncoderLayer(out_dim: ~returnn.tensor.dim.Dim = Dim{'transformer-enc-default-out-dim'(512)}, *, ff: type | ~typing.Dict[str, ~typing.Any] | ~returnn.frontend.module.Module = <class 'returnn.util.basic.NotSpecified'>, dropout: float = 0.1, num_heads: int = 8, self_att: ~returnn.frontend.attention.CausalSelfAttention | ~returnn.frontend.attention.RelPosCausalSelfAttention | ~returnn.frontend.module.Module | type | ~typing.Dict[str, ~typing.Any] | None = None, att_dropout: float = 0.1, norm: type | ~typing.Dict[str, ~typing.Any] | ~returnn.frontend.module.Module | ~typing.Callable = <class 'returnn.frontend.normalization.LayerNorm'>)[source]¶

Represents a Transformer encoder block

Parameters:

out_dim – the output feature dimension
ff – feed-forward / MLP block. Default is FeedForward
dropout – the dropout value for the FF block
num_heads – the number of attention heads
self_att – the self-attention layer. SelfAttention originally and default
att_dropout – attention dropout value
norm – pre-normalization for FF and attention blocks

default_initial_state(*, batch_dims: Sequence[Dim]) → State[source]¶: default initial state

returnn.frontend.encoder.transformer¶

`returnn.frontend.encoder.transformer`¶