Basic Usage¶

Install RETURNN, Installation.

Now rnn.py is the main entry point. Usage:

./rnn.py <config-file> [other-params]

where config-file is a config file for RETURNN. See here for an example, and many more examples from the demos. The configuration syntax can be in three different forms:

executable Python code (determined by a “#!” at the beginning of the file)
a JSON file (determined by a “{” at the beginning of the file)
a simple line-based file with key value pairs

Config files using the Python code syntax are the de-facto standard for all current examples and setups. The parameters can be set by defining global variables, but it is possible to use any form of Python code such as functions and classes to construct your network or fill in global variables based on more complex decisions. The Python syntax config files may also contain additional code such as layer or dataset definitions.

When calling rnn.py will execute some task, such as train, forward or search.

The task train will train a model specified by a given network structure. After training each epoch on provided `training data, the current parameters will be stored to a model checkpoint file. Besides the training data, a development dataset is used to evaluate the current model, and store the evaluation results in a separate file.

The task forward will load a model and a dataset and can do arbitrary computations and then process it and/or store it in arbitrary ways. E.g. run a forward pass of the network, given an evaluation dataset, and store the results in an HDF file. Or calculate and store the log-likelihoods of the target labels. Or perform beam search decoding and store the results. Or accumulate statistics, e.g. computing the priors of the target labels over the dataset.

The task search (TF specific) is used to run the network with the beam-search algorithm. The results are serialized into text form and stored in a plain text file Python dictionary format file.

The following parameters are very common, and are used in most RETURNN config files:

task

The task, such as train or forward.

device

E.g. gpu or cpu. Although RETURNN will automatically detect and use a GPU if available, a specific device can be enforced by setting this parameter.

use_tensorflow

If you set this to True, the TensorFlow will be used. Otherwise, the installed backend is used. If both backends are installed (TensorFlow and Theano), RETURNN will use Theano as default for legacy reasons.

train / dev / eval

The datasets parameters are set to a Python dict with a mandatory entry class. The class attribute needs to be set to the class name of the dataset that should be used. An overview over available datasets can be found here. train and dev are used during training, while eval is usually used to define the dataset for the forward or search task.

Beside passing the constructor parameters to the specific Dataset, there are some common parameters such as:

seq_ordering: This defines the order of the sequences provided by the dataset. Possible values are:

default: Keep the sequences as is
reverse: Use the default sequences in reversed order
random: Shuffle the data with a predefined fixed seed
random:<seed>: Shuffle the data with the seed given
sorted: Sort by length (only if available), beginning with shortest sequences
sorted_reverse: Sort by length, beginning with longest sequences
laplace:<n_buckets>: Sort by length with n laplacian buckets (one bucket means going from shortest to longest and back with 1/n of the data).
laplace:.<n_sequences>: sort by length with n sequences per laplacian bucket.

Note that not all sequence order modes are available for all datasets, and some datasets may provide additional modes.

See also Dataset Input/Output.

extern_data

Defines the source/target dimensions of the data as a dictionary of dictionaries describing data streams. The standard source data is called “data” by default, and the standard target data is called “classes” by default.

A common example for an ASR system would be:

extern_data = {
  "data": {"dim": 100, "shape": (None, 100)}
  "classes": {"dim": 5000, "shape": (None,), "sparse": True}
}

In this case the data entry defines 80 dimensional features with a time axis of arbitrary length. classes defines sparse target labels, and the dimension then defines the number of labels. The shape entries None indicate a dynamic length of an axis.

In general, all input parameters to returnn.tensor.Tensor can be provided The parameters dim and shape should always be used, the other parameters are optional. Note that only for data the parameter available_for_inference is per default True`.

model_outputs

Like extern_data, but defines the model outputs, for the forward task.

get_model

Function:

def get_model(*, epoch: int, step: int, **_other_kwargs) -> torch.nn.Module | rf.Module:
    ...

    return model

This must return a model, randomly initialized. Potential loading of existing parameters will be done afterwards.

train_step

Function:

def train_step(*, model: Model, extern_data: TensorDict, **_kwargs) -> None:
    import returnn.frontend as rf

    ...

    rf.get_run_ctx().mark_as_loss(...)

This function is called for every batch, and calculates losses, and registers them via rf.get_run_ctx().mark_as_loss(...). The RETURNN training loop will then take care of optimization, backpropagation, etc.

forward_step

Function:

def forward_step(*, model: Model, extern_data: TensorDict, **_kwargs) -> Dict[str, Tensor]:
    import returnn.frontend as rf

    ...

    rf.get_run_ctx().mark_as_output(...)

This function is called for every batch during the forward task. It can calculate arbitrary outputs, and register them via rf.get_run_ctx().mark_as_output(...).

forward_callback

Instance of ForwardCallbackIface, or function which returns such an instance:

def forward_callback() -> ForwardCallbackIface:
    from returnn.forward_iface import ForwardCallbackIface

    class MyForwardCallback(ForwardCallbackIface):
        def init(self, *, model):
            """
            Run at the beginning.
            """
            ...

        def process_seq(self, *, seq_tag: str, outputs: TensorDict):
            """
            Called for each sequence, or entry in the dataset.
            This does not have the batch dim anymore.
            The values in `outputs` are Numpy arrays.

            :param seq_tag:
            :param outputs:
            """
            ...

        def finish(self):
            """
            Run at the end.
            """
            ...

    return MyForwardCallback()

This instance will be called during the forward task. You can do arbitrary processing of the outputs of the network, e.g. storing them in a custom way, such as writing to a HDF file, to a text file, etc., or accumulating statistics, etc.

network

(TF specific. See get_model for a more general approach.) This is a dict which defines the network topology for the TF layers backend. Note that the TF layers backend is only one possibility to define a network and loss function, but you can also use the RETURNN frontend or pure PyTorch code directly (via get_model, train_step, forward_step).

It consists of layer-names as strings, mapped on dicts, which defines the layers. The layer dict consists of keys as strings and the value type depends on the key. The layer dict should contain the key class which defines the class or type of the layer, such as hidden for a feed-forward layer, rec for a recurrent layer (including LSTM) or softmax for the output layer (doesn’t need to have the softmax activation). Usually it also contains the key n_out which defines the feature-dimension of the output of this layer, and the key from which defines the inputs to this layer, which is a list of other layers. If you omit from, it will automatically pass in the input data from the dataset. All layer dict keys are passed to the layer class __init__, so you have to refer to the code for all details.

Example of a 3 layer bidirectional LSTM:

network = {
"lstm0_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1 },
"lstm0_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1 },

"lstm1_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm0_fw", "lstm0_bw"] },
"lstm1_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm0_fw", "lstm0_bw"] },

"lstm2_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm1_fw", "lstm1_bw"] },
"lstm2_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm1_fw", "lstm1_bw"] },

"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["lstm2_fw", "lstm2_bw"] }
}

See API or the code itself for documentation of the arguments for each layer class type. The rec layer class in particular supports a wide range of arguments, and several units which can be used, e.g. you can choose between different LSTM implementations, or GRU, or standard RNN, etc. See returnn.tf.layers.rec.RecLayer. See also TensorFlow LSTM Benchmark.

See Network Structure for more on how to define the network, and losses.

batch_size

The total number of frames. A mini-batch has at least a time-dimension and a batch-dimension (or sequence-dimension), and depending on dense or sparse, also a feature-dimension. batch_size is the upper limit for max(seq_lens) * num_seqs during creation of the mini-batches.

max_seqs

The maximum number of sequences in one mini-batch.

learning_rate

The learning rate during training, e.g. 0.01.

optimizer

The optimizer to use during training, e.g. adam or sgd. Can also be a dict to provide additional parameters.

model

Defines the model file where RETURNN will save all model params after an epoch of training. For each epoch, it will suffix the filename by the epoch number. When running forward or search, the specified model will be loaded. The epoch can then be selected with the parameter load_epoch.

num_epochs

The number of epochs to train.

log_verbosity

An integer. Common values are 3 or 4. Starting with 5, you will get an output per mini-batch.

There are much more parameters, and more details to many of the listed ones. Details on the parameters can be found in the parameter reference. As the reference is still incomplete, please watch out for additional parameters that can be found in the code.

All configuration params can also be passed as command line parameters. The generic form is ++param value, but more options are available. Please See the code for some usage.