Technological Overview

RETURNN is a machine learning toolkit that can be used as standalone application or Python framework for training and running sequential neural network architectures.

For an overview of the core concepts behind RETURNN, see the slides of our Interspeech 2020 tutorial about machine learning frameworks including RETURNN.

The main tasks of RETURNN are:

  • Network construction, i.e. definition of the computation graph

  • Dataset loading with predefined and extendable returnn.datasets.Dataset objects

  • Automatic management of layer outputs (such as tensor axes and time dimensions) with a Data object

  • Support of dynamic training schemes that allow for network structure and parameter changes during training

  • Managing the losses and optimizer functions

  • Learning rate scheduling based on training scores

RETURNN supports two calculation backends: TensorFlow and Theano. It is recommended to stick to the TensorFlow backend, as Theano is deprecated.

RETURNN is mostly used as a tool where is the main entry point (see Basic Usage) but you can also use it as a framework / Python module to use in your own Python code (see RETURNN as Framework). To get an idea about how it works, it helps to follow roughly the execution path starting in returnn.__main__, esp. in returnn.__main__.main(). In all cases, the code itself should be checked for details and comments.

For recent development on RETURNN, see Recent development of RETURNN. If you want to work on the RETURNN code, e.g. for an extension, please see Extending RETURNN.


Many components are implemented separately for both Theano and TensorFlow:

  • The engine for high-level logic, although a bit is shared. returnn.theano.engine for Theano and for TensorFlow. For TensorFlow the engine contains the high level methods for training, forward pass, and other executed tasks. It keeps track of the network, devices, models and the updater function, and is the main connection between all these components. also contains the which is responsible for managing the TensorFlow session.

  • Network topology construction which constructs the computation graph for training or just forwarding.,

  • Network model update code for training, i.e. SGD etc. returnn.theano.updater,

  • All the individual layer implementations. returnn.theano.layers for Theano and for TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.

  • Some utilities returnn.theano.util and, which contains the returnn.tensor.Tensor class.

  • Multi-GPU logic. returnn.theano.device, returnn.theano.engine_task for Theano,, for TensorFlow.

All the rest is shared for all backends, which mostly is:

Execution guide

Using RETURNN as a tool, execution consists of calling returnn/ path/to/my_file.config.

The program follows then the following track:

Execution tasks

The goal of every execution in RETURNN tries to achieve one of the tasks:

  • train: Trains the network with the given dataset. It requires at least a valid train dataset. If eval, dev or eval_datasets are specified they are evaluated at the end of each epoch. Further informations can be found in

  • eval: Evaluates on eval, dev or eval_datasets if specified. It requires load_epoch or epoch for loading the weights of the network.

  • search: Performs beam search on the dataset as specified by search_data. The networks weights are loaded according to load_epoch or epoch. The beam size can be specified with beam_size. For futher information look in

  • nop: This task is used to proof check everything not related to the network and the dataset. So datasets and the nework are not initialized at all.

  • nop_init_net_train: Initializes the network and training dataset train but doesn’t start training.

  • initialize_model: Similiar to nop_init_net_train but it saves a checkpoint at the end.

  • cleanup_old_models: Cleans up models if we have done some lr control. With cleanup_old_models more options can be specified.

  • compute_priors: Computes the priors of network outputs for the training dataset.

  • analyze: Analyses training dataset for the given network. Calculates stuff such as loss, perplexity, ce, frame error, seq length, and prob histograms per batch and for one whole epoch(accumulated).

Network Construction

The network structure which defines the model topology is defined by the config network option, which is a dict, where each entry is a layer specification, which itself is a dict containing the kwargs for the specific layer class. E.g.:

network = {
    "fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "data"},
    "fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "fw1"},
    "output": {"class": "softmax", "loss": "ce", "from": "fw2"}

The "class" key will get extracted from the layer arguments and the specific layer class will be used. For Theano, the base layer class is returnn.theano.layers.base.Container and returnn.theano.layers.base.Layer; for TensorFlow, it is E.g. that would use the class, and the LinearLayer.__init__ will accepts arguments like activation. In the given example, all the remaining arguments will get handled by the base layer.

The construction itself can be found for TensorFlow in, which starts from the output layers goes over the sources of a layer, which are defined by "from". If a layer does not define "from", it will automatically get the input from the dataset data.

The network itself is stored in a

The network, layers, and the dataset make heavy use of returnn.tensor.Tensor, see Tensor and Dim.

Here is a 2 layer unidirectional LSTM network:

network = {
    "lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "data"},
    "lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "lstm1"},
    "output": {"class": "softmax", "loss": "ce", "from": "lstm2"}

In TensorFlow, that would use the layer class which will handle the argument unit.

See Network Structure for more about the network construction and layer declarations.

See also the next section specifically about recurrency.


Recurrency := Anything which is defined by step-by-step execution, where current step depends on previous step, such as RNN, beam search, etc.

This is all covered by, which is a generic wrapper around tf.while_loop. It covers:

  • Definition of stochastic variables (the output classes itself but also latent variables) for either beam search or training (e.g. using ground truth values)

  • Automatic optimizations

See Recurrency for more details how this works.


The engine will loop over the epochs and the individual batches / steps and loads and saves the model. The specific implementation is different in Theano and TensorFlow. See the code for more details, i.e. returnn.theano.engine, returnn.theano.engine_task for Theano and for TensorFlow.

See Training for an overview of relevant training aspects.