Basic usage

Install RETURNN, Installation.

Now is the main entry point. Usage:

./ <config-file> [other-params]

where config-file is a config file for RETURNN. See here for an example, and many more examples from the demos. We support multiple config syntax, such as simple line-based key value, JSON based which is determined by a “{” at the beginning of the file or Python code which is determined by a “#!” at the beginning of the file. There is a variety of config params. will execute some task, such as train or forward. You must define the task and the train and dev datasets which it will use, the mini-batch construction variants, as well as the neural network topology, as well as some training parameters.

Some common config parameters:

The task, such as train or forward.
E.g. gpu or cpu. Can also be gpu0,gpu1 for multi-GPU training.
If you set this to True, TensorFlow will be used.
train / dev
The datasets. This can be a filename to a hdf-file. Or it can be a dict with an entry class where you can choose a from a variety of other dataset implementations, including many synthetic generated data.
extern_data (former num_outputs)

Defines the source/target dimensions of the data. Both can be integers. extern_data can also be a dict if your dataset has other data streams. The standard source data is called “data” by default, and the standard target data is called “classes” by default. You can also specify whether your data is dense or sparse (i.e. it is just the index), which is specified by the number of dimensions, i.e. 2 (time-dim + feature-dim) or 1 (just time-dim). When using no explicit definition, it is assumed that the data contains a time axis.

Example: extern_data = {"data": [100, 2], "classes": [5000, 1]}. This defines an input dimension of 100, and the input is dense (2), and an output dimension of 5000, and the output provided by the dataset is sparse (1).

For a more explicit definition of the shapes, you can provide a dict instead of a list or tuple. This dict may contain information to create “Data” objects. For extern_data, only dim and shape are required. Example: 'speaker_classes': {'dim': 1172, 'shape': (), 'sparse': True} This defines a sparse input for e.g. speaker classes that do not have a time axis.

In general, all input parameters to TFUtil.Data can be provided.

The sorting variant when the mini-batches are created. E.g. random.
The total number of frames. A mini-batch has at least a time-dimension and a batch-dimension (or sequence-dimension), and depending on dense or sparse, also a feature-dimension. batch_size is the upper limit for time * sequences during creation of the mini-batches.
The maximum number of sequences in one mini-batch.
You can chunk sequences of your data into parts, which will greatly reduce the amount of needed zero-padding. This option is a string of two numbers, separated by a comma, i.e. chunk_size:chunk_step, where chunk_size is the size of a chunk, and chunk_step is the step after which we create the next chunk. I.e. the chunks will overlap by chunk_size - chunk_step frames. Set this to 0 to disable it, or for example 100:75 to enable it.

This is a dict which defines the network topology. It consists of layer-names as strings, mapped on dicts, which defines the layers. The layer dict consists of keys as strings and the value type depends on the key. The layer dict should contain the key class which defines the class or type of the layer, such as hidden for a feed-forward layer, rec for a recurrent layer (including LSTM) or softmax for the output layer (doesn’t need to have the softmax activation). Usually it also contains the key n_out which defines the feature-dimension of the output of this layer, and the key from which defines the inputs to this layer, which is a list of other layers. If you omit from, it will automatically pass in the input data from the dataset. All layer dict keys are passed to the layer class __init__, so you have to refer to the code for all details.

Example of a 3 layer bidirectional LSTM:

network = {
"lstm0_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1 },
"lstm0_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1 },

"lstm1_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm0_fw", "lstm0_bw"] },
"lstm1_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm0_fw", "lstm0_bw"] },

"lstm2_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm1_fw", "lstm1_bw"] },
"lstm2_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm1_fw", "lstm1_bw"] },

"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["lstm2_fw", "lstm2_bw"] }

See API or the code itself for documentation of the arguments for each layer class type. The rec layer class in particular supports a wide range of arguments, and several units which can be used, e.g. you can choose between different LSTM implementations, or GRU, or standard RNN, etc. See TFNetworkRecLayer.RecLayer or NetworkRecurrentLayer.RecurrentUnitLayer. See also TensorFlow LSTM benchmark.

The learning rate during training, e.g. 0.01.
adam / nadam / …
E.g. set adam = True to enable the Adam optimization during training. See in for many more.
Defines the model file where RETURNN will save all model params after an epoch of training. For each epoch, it will suffix the filename by the epoch number.
The number of epochs to train.
An integer. Common values are 3 or 4. Starting with 5, you will get an output per mini-batch.

There are much more params, and more details to many of the listed ones. See the code for more details. All config params can also be passed as command line params. See the code for some usage. The generic form is ++param value.

See Technological overview for more details and an overview how it all works.

Usage as a framework

Install RETURNN via pip (PyPI entry). Then import returnn should work. See as a full example.

Basically you can write very high level code like this:

from returnn.TFEngine import Engine
from returnn.Dataset import init_dataset
from returnn.Config import get_global_config

config = get_global_config(auto_create=True)
    # ...

engine = Engine(config)

train_data = init_dataset({"class": "Task12AXDataset", "num_seqs": 1000, "name": "train"})
dev_data = init_dataset({"class": "Task12AXDataset", "num_seqs": 100, "name": "dev", "fixed_random_seed": 1})

engine.init_train_from_config(train_data=train_data, dev_data=dev_data)

Or you go lower level and construct the computation graph yourself:

from returnn.TFNetwork import TFNetwork

config = get_global_config(auto_create=True)

net = TFNetwork(train_flag=True)
    # ...
fetches = net.get_fetches_dict()

with tf.Session() as session:
    results =, feed_dict={
        # ...
        # you could use FeedDictDataProvider

Or even lower level and just use parts from TFUtil, TFNativeOp, etc.:

from returnn.TFNativeOp import ctc_loss
from returnn.TFNativeOp import edit_distance
from returnn.TFNativeOp import NativeLstm2

from returnn.TFUtil import ctc_greedy_decode
from returnn.TFUtil import get_available_gpu_min_compute_capability
from returnn.TFUtil import safe_log
from returnn.TFUtil import reuse_name_scope
from returnn.TFUtil import dimshuffle

# ...