Technological Overview

RETURNN is a machine learning toolkit that can be used as standalone application or framework for training and running sequential neural network architectures. The main tasks of RETURNN are:

  • Network construction via nested dictionaries
  • Data loading with predefined and extendable dataset objects
  • Automatic management of layer outputs (such as tensor axes and time dimensions) with a Data object
  • Support of dynamic training schemes that allow for network structure and parameter changes during training
  • Managing the losses and optimizer functions
  • Learning rate scheduling based on training scores

RETURNN supports two calculation backends: TensorFlow and Theano. It is recommended to stick to the TensorFlow backend, as Theano is no longer supported.

RETURNN is mostly used as a tool where is the main entry point but you can also use it as a framework / Python module to use in your own Python code. To get an idea about how it works, it helps to follow roughly the execution path starting in rnn, esp. in rnn.main(). In all cases, the code itself should be checked for details and comments.


Many components are implemented separately for both Theano and TensorFlow:

  • The engine for high-level logic, although a bit is shared. Engine, EngineTask for Theano and TFEngine for TensorFlow. For TensorFlow the engine contains the high level methods for training, forward pass, and other executed tasks. It keeps track of the network, devices, models and the updater function, and is the main connection between all these components. TFEngine also contains the TFEngine.Runner which is responsible for managing the TensorFlow session.
  • Network topology construction which constructs the computation graph for training or just forwarding. Network, TFNetwork.
  • Network model update code for training, i.e. SGD etc. Updater, TFUpdater.
  • All the individual layer implementations. NetworkLayer, NetworkBaseLayer, NetworkHiddenLayer, NetworkRecurrentLayer etc for Theano and TFNetworkLayer, TFNetworkRecLayer for TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.
  • Some utilities TheanoUtil and TFUtil, which contains the class.
  • Multi-GPU logic. Device, EngineTask for Theano and not yet implemented for TensorFlow.

All the rest is shared for all backends, which mostly is:

  • The main entry point rnn.
  • Config handling Config.
  • Logging Log.
  • Utilities Util.
  • Dataset reading Dataset including all the different dataset implementations HDFDataset, SprintDataset, LmDataset, GeneratingDataset, MetaDataset, etc.
  • Learning rate scheduling logic such as Newbob LearningRateControl.
  • Pretrain network structure construction Pretrain.
  • The native op code which generates code for ops for both CUDA and CPU shares a common base. NativeOp, where TensorFlow-specific code is in TFNativeOp.

Execution guide

  • rnn.main() will parse command line arguments and read in a config.
  • Then logging Log is initialized, based on verbosity and other settings.
  • Then it initializes the datasets (train, dev, eval in config), i.e. Dataset instances.
  • Theano-only: Device instances.
  • The engine, i.e. a Engine or TFEngine instance.
  • Depending on the task option, some engine initialization which also initializes the network computation graph, tech_net_construct.
  • Then, depending on the task option, it might start engine.train, engine.forward etc. (Engine.Engine.train() or TFEngine.Engine.train()), Training.

Network Construction

The network structure which defines the model topology is defined by the config network option, which is a dict, where each entry is a layer specification, which itself is a dict containing the kwargs for the specific layer class. E.g.:

network = {
    "fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500},
    "fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": ["fw1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["fw2"]}

The "class" key will get extracted from the layer arguments and the specific layer class will be used. For Theano, the base layer class is NetworkBaseLayer.Container and NetworkBaseLayer.Layer; for TensorFlow, it is E.g. that would use the TFNetworkLayer.LinearLayer class, and the LinearLayer.__init__ will accepts arguments like activation. In the given example, all the remaining arguments will get handled by the base layer.

The construction itself can be found for TensorFlow in, which starts from the output layers goes over the sources of a layer, which are defined by "from". If a layer does not define "from", it will automatically get the input from the dataset data.

Here is a 2 layer unidirectional LSTM network:

network = {
    "lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500},
    "lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": ["lstm1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["lstm2"]}

In TensorFlow, that would use the layer class TFNetworkRecLayer.RecLayer which will handle the argument unit.


The engine will loop over the epochs and the individual batches / steps and loads and saves the model. The specific implementation is different in Theano and TensorFlow. See the code for more details, i.e. Engine, EngineTask for Theano and TFEngine for TensorFlow.