returnn.frontend.encoder.base

Base interface for any kind of encoder.

This is basically any generic function x -> y.

Note that in practice, when designing some model, this interface is even not needed, because you only care about the final encoded vectors, and not how you got there. Automatic differentiation will automatically also train the encoder. So, for most purpose, e.g. for a decoder (see decoder.base), you only care about some encoded vector of type Tensor.

class returnn.frontend.encoder.base.IEncoder[source]

Generic encoder interface

The encoder is a function x -> y. The input can potentially be sparse or dense. The output is dense with feature dim out_dim.

By convention, any options to the module are passed to __init__, and potential changing inputs (other tensors) are passed to __call__().

out_dim: Dim[source]
class returnn.frontend.encoder.base.ISeqFramewiseEncoder[source]

This specializes IEncoder that it operates on a sequence. The output sequence length here is the same as the input.

By convention, any options to the module are passed to __init__, and potential changing inputs (other tensors) are passed to __call__().

out_dim: Dim[source]
class returnn.frontend.encoder.base.ISeqDownsamplingEncoder[source]

This is more specific than IEncoder in that it operates on a sequence. The output sequence length here is shorter than the input.

This is a common scenario for speech recognition where the input might be on 10ms/frame and the output might cover 30ms/frame or 60ms/frame or so.

By convention, any options to the module are passed to __init__, and potential changing inputs (other tensors) are passed to __call__().

out_dim: Dim[source]
downsample_factor: int | float[source]