Technological Overview¶

RETURNN is a machine learning toolkit that can be used as standalone application or Python framework for training and running sequential neural network architectures.

For an overview of the core concepts behind RETURNN, see the slides of our Interspeech 2020 tutorial about machine learning frameworks including RETURNN.

The main tasks of RETURNN are:

Network construction, i.e. definition of the computation graph
Dataset loading with predefined and extendable returnn.datasets.Dataset objects
Automatic management of layer outputs (such as tensor axes and time dimensions) with a Data object
Support of dynamic training schemes that allow for network structure and parameter changes during training
Managing the losses and optimizer functions
Learning rate scheduling based on training scores

RETURNN supports two calculation backends: TensorFlow and Theano. It is recommended to stick to the TensorFlow backend, as Theano is deprecated.

RETURNN is mostly used as a tool where rnn.py is the main entry point (see Basic Usage) but you can also use it as a framework / Python module to use in your own Python code (see RETURNN as Framework). To get an idea about how it works, it helps to follow roughly the execution path starting in returnn.__main__, esp. in returnn.__main__.main(). In all cases, the code itself should be checked for details and comments.

For recent development on RETURNN, see Recent development of RETURNN. If you want to work on the RETURNN code, e.g. for an extension, please see Extending RETURNN.

Structure¶

Many components are implemented separately for both Theano and TensorFlow:

The engine for high-level logic, although a bit is shared. returnn.theano.engine for Theano and returnn.tf.engine for TensorFlow. For TensorFlow the engine contains the high level methods for training, forward pass, and other executed tasks. It keeps track of the network, devices, models and the updater function, and is the main connection between all these components. returnn.tf.engine also contains the returnn.tf.engine.Runner which is responsible for managing the TensorFlow session.
Network topology construction which constructs the computation graph for training or just forwarding. returnn.theano.network, returnn.tf.network.
Network model update code for training, i.e. SGD etc. returnn.theano.updater, returnn.tf.updater.
All the individual layer implementations. returnn.theano.layers for Theano and returnn.tf.layers for TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.
Some utilities returnn.theano.util and returnn.tf.util, which contains the returnn.tensor.Tensor class.
Multi-GPU logic. returnn.theano.device, returnn.theano.engine_task for Theano, returnn.tf.distributed, returnn.tf.horovod for TensorFlow.

All the rest is shared for all backends, which mostly is:

The main entry point returnn.__main__.
Config handling returnn.config.
Logging returnn.log.
Utilities returnn.util.
Dataset reading returnn.datasets including all the different dataset implementations HDFDataset, SprintDataset, LmDataset, GeneratingDataset, MetaDataset, etc.
Learning rate scheduling logic such as Newbob returnn.learning_rate_control.
Pretrain network structure construction returnn.pretrain.
The native op code which generates code for ops for both CUDA and CPU shares a common base. returnn.native_op, where TensorFlow-specific code is in returnn.tf.native_op.

Execution guide¶

Using RETURNN as a tool, execution consists of calling returnn/rnn.py path/to/my_file.config.

The program follows then the following track:

returnn.__main__.main() will parse command line arguments and read in a config (returnn.config.Config).
Then logging (returnn.log, returnn.log.Log) is initialized, based on verbosity and other settings.
Then it initializes the datasets (train, dev, eval in config), i.e. returnn.datasets.Dataset instances. See Dataset Input/Output and Datasets.
Theano-only: returnn.theano.device.Device instances.
The engine, i.e. a returnn.tf.engine.Engine instance.
Depending on the task option, some engine initialization which also initializes the network computation graph, Network Construction.
Then, depending on the task option, it might start engine.train, engine.forward etc. (returnn.tf.engine.Engine.train()), Training.

Execution tasks¶

The goal of every execution in RETURNN tries to achieve one of the tasks:

train: Trains the network with the given dataset. It requires at least a valid train dataset. If eval, dev or eval_datasets are specified they are evaluated at the end of each epoch. Further informations can be found in returnn.tf.engine.Engine.train().
eval: Evaluates on eval, dev or eval_datasets if specified. It requires load_epoch or epoch for loading the weights of the network.
search: Performs beam search on the dataset as specified by search_data. The networks weights are loaded according to load_epoch or epoch. The beam size can be specified with beam_size. For futher information look in returnn.tf.engine.Engine.search().
nop: This task is used to proof check everything not related to the network and the dataset. So datasets and the nework are not initialized at all.
nop_init_net_train: Initializes the network and training dataset train but doesn’t start training.
initialize_model: Similiar to nop_init_net_train but it saves a checkpoint at the end.
cleanup_old_models: Cleans up models if we have done some lr control. With cleanup_old_models more options can be specified.
compute_priors: Computes the priors of network outputs for the training dataset.
analyze: Analyses training dataset for the given network. Calculates stuff such as loss, perplexity, ce, frame error, seq length, and prob histograms per batch and for one whole epoch(accumulated).
…

Network Construction¶

The network structure which defines the model topology is defined by the config network option, which is a dict, where each entry is a layer specification, which itself is a dict containing the kwargs for the specific layer class. E.g.:

network = {
    "fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "data"},
    "fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "fw1"},
    "output": {"class": "softmax", "loss": "ce", "from": "fw2"}
}

The "class" key will get extracted from the layer arguments and the specific layer class will be used. For Theano, the base layer class is returnn.theano.layers.base.Container and returnn.theano.layers.base.Layer; for TensorFlow, it is returnn.tf.layers.base.LayerBase. E.g. that would use the returnn.tf.layers.basic.LinearLayer class, and the LinearLayer.__init__ will accepts arguments like activation. In the given example, all the remaining arguments will get handled by the base layer.

The construction itself can be found for TensorFlow in returnn.tf.network.TFNetwork.construct_from_dict(), which starts from the output layers goes over the sources of a layer, which are defined by "from". If a layer does not define "from", it will automatically get the input from the dataset data.

The network itself is stored in a returnn.tf.network.TFNetwork.

The network, layers, and the dataset make heavy use of returnn.tensor.Tensor, see Tensor and Dim.

Here is a 2 layer unidirectional LSTM network:

network = {
    "lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "data"},
    "lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "lstm1"},
    "output": {"class": "softmax", "loss": "ce", "from": "lstm2"}
}

In TensorFlow, that would use the layer class returnn.tf.layers.rec.RecLayer which will handle the argument unit.

See Network Structure for more about the network construction and layer declarations.

See also the next section specifically about recurrency.

Recurrency¶

Recurrency := Anything which is defined by step-by-step execution, where current step depends on previous step, such as RNN, beam search, etc.

This is all covered by returnn.tf.layers.rec.RecLayer, which is a generic wrapper around tf.while_loop. It covers:

Definition of stochastic variables (the output classes itself but also latent variables) for either beam search or training (e.g. using ground truth values)
Automatic optimizations

See Recurrency for more details how this works.

Training¶

The engine will loop over the epochs and the individual batches / steps and loads and saves the model. The specific implementation is different in Theano and TensorFlow. See the code for more details, i.e. returnn.theano.engine, returnn.theano.engine_task for Theano and returnn.tf.engine for TensorFlow.

See Training for an overview of relevant training aspects.