Basic Usage¶
Install RETURNN, Installation.
Now rnn.py is the main entry point. Usage:
./rnn.py <config-file> [other-params]
where config-file is a config file for RETURNN.
See here for an example,
and many more examples from the demos.
The configuration syntax can be in three different forms:
executable Python code (determined by a “
#!” at the beginning of the file)a JSON file (determined by a “
{” at the beginning of the file)a simple line-based file with
key valuepairs
Config files using the Python code syntax are the de-facto standard for all current examples and setups. The parameters can be set by defining global variables, but it is possible to use any form of Python code such as functions and classes to construct your network or fill in global variables based on more complex decisions. The Python syntax config files may also contain additional code such as layer or dataset definitions.
When calling rnn.py will execute some task, such as train, forward or search.
The task train will train a model specified by a given network structure.
After training each epoch on provided `training data, the current parameters will be stored to a model checkpoint file.
Besides the training data, a development dataset is used to evaluate the current model, and store the evaluation
results in a separate file.
The task forward will load a model and a dataset and can do arbitrary computations and then process it and/or store it in arbitrary ways.
E.g. run a forward pass of the network, given an evaluation dataset, and store the results in
an HDF file.
Or calculate and store the log-likelihoods of the target labels.
Or perform beam search decoding and store the results.
Or accumulate statistics, e.g. computing the priors of the target labels over the dataset.
The task search (TF specific) is used to run the network with the beam-search algorithm.
The results are serialized into text form and stored in a plain text file Python dictionary format file.
The following parameters are very common, and are used in most RETURNN config files:
- task
The task, such as
trainorforward.- device
E.g.
gpuorcpu. Although RETURNN will automatically detect and use a GPU if available, a specific device can be enforced by setting this parameter.- use_tensorflow
If you set this to
True, the TensorFlow will be used. Otherwise, the installed backend is used. If both backends are installed (TensorFlow and Theano), RETURNN will use Theano as default for legacy reasons.- train / dev / eval
The datasets parameters are set to a Python dict with a mandatory entry
class. Theclassattribute needs to be set to the class name of the dataset that should be used. An overview over available datasets can be found here.trainanddevare used during training, whileevalis usually used to define the dataset for theforwardorsearchtask.Beside passing the constructor parameters to the specific Dataset, there are some common parameters such as:
seq_ordering: This defines the order of the sequences provided by the dataset. Possible values are:default: Keep the sequences as isreverse: Use the default sequences in reversed orderrandom: Shuffle the data with a predefined fixed seedrandom:<seed>: Shuffle the data with the seed givensorted: Sort by length (only if available), beginning with shortest sequencessorted_reverse: Sort by length, beginning with longest sequenceslaplace:<n_buckets>: Sort by length with n laplacian buckets (one bucket means going from shortest to longest and back with 1/n of the data).laplace:.<n_sequences>: sort by length with n sequences per laplacian bucket.
Note that not all sequence order modes are available for all datasets, and some datasets may provide additional modes.
See also Dataset Input/Output.
- extern_data
Defines the source/target dimensions of the data as a dictionary of dictionaries describing data streams. The standard source data is called “
data” by default, and the standard target data is called “classes” by default.A common example for an ASR system would be:
extern_data = { "data": {"dim": 100, "shape": (None, 100)} "classes": {"dim": 5000, "shape": (None,), "sparse": True} }
In this case the
dataentry defines 80 dimensional features with a time axis of arbitrary length.classesdefines sparse target labels, and the dimension then defines the number of labels. The shape entriesNoneindicate a dynamic length of an axis.In general, all input parameters to
returnn.tensor.Tensorcan be provided The parametersdimandshapeshould always be used, the other parameters are optional. Note that only fordatathe parameteravailable_for_inferenceis per default True`.- model_outputs
Like
extern_data, but defines the model outputs, for theforwardtask.- get_model
Function:
def get_model(*, epoch: int, step: int, **_other_kwargs) -> torch.nn.Module | rf.Module: ... return model
This must return a model, randomly initialized. Potential loading of existing parameters will be done afterwards.
- train_step
Function:
def train_step(*, model: Model, extern_data: TensorDict, **_kwargs) -> None: import returnn.frontend as rf ... rf.get_run_ctx().mark_as_loss(...)
This function is called for every batch, and calculates losses, and registers them via
rf.get_run_ctx().mark_as_loss(...). The RETURNN training loop will then take care of optimization, backpropagation, etc.- forward_step
Function:
def forward_step(*, model: Model, extern_data: TensorDict, **_kwargs) -> Dict[str, Tensor]: import returnn.frontend as rf ... rf.get_run_ctx().mark_as_output(...)
This function is called for every batch during the
forwardtask. It can calculate arbitrary outputs, and register them viarf.get_run_ctx().mark_as_output(...).- forward_callback
Instance of
ForwardCallbackIface, or function which returns such an instance:def forward_callback() -> ForwardCallbackIface: from returnn.forward_iface import ForwardCallbackIface class MyForwardCallback(ForwardCallbackIface): def init(self, *, model): """ Run at the beginning. """ ... def process_seq(self, *, seq_tag: str, outputs: TensorDict): """ Called for each sequence, or entry in the dataset. This does not have the batch dim anymore. The values in `outputs` are Numpy arrays. :param seq_tag: :param outputs: """ ... def finish(self): """ Run at the end. """ ... return MyForwardCallback()
This instance will be called during the
forwardtask. You can do arbitrary processing of the outputs of the network, e.g. storing them in a custom way, such as writing to a HDF file, to a text file, etc., or accumulating statistics, etc.- network
(TF specific. See
get_modelfor a more general approach.) This is a dict which defines the network topology for the TF layers backend. Note that the TF layers backend is only one possibility to define a network and loss function, but you can also use the RETURNN frontend or pure PyTorch code directly (viaget_model,train_step,forward_step).It consists of layer-names as strings, mapped on dicts, which defines the layers. The layer dict consists of keys as strings and the value type depends on the key. The layer dict should contain the key
classwhich defines the class or type of the layer, such ashiddenfor a feed-forward layer,recfor a recurrent layer (including LSTM) orsoftmaxfor the output layer (doesn’t need to have the softmax activation). Usually it also contains the keyn_outwhich defines the feature-dimension of the output of this layer, and the keyfromwhich defines the inputs to this layer, which is a list of other layers. If you omitfrom, it will automatically pass in the input data from the dataset. All layer dict keys are passed to the layer class__init__, so you have to refer to the code for all details.Example of a 3 layer bidirectional LSTM:
network = { "lstm0_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1 }, "lstm0_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1 }, "lstm1_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm0_fw", "lstm0_bw"] }, "lstm1_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm0_fw", "lstm0_bw"] }, "lstm2_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm1_fw", "lstm1_bw"] }, "lstm2_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm1_fw", "lstm1_bw"] }, "output" : { "class" : "softmax", "loss" : "ce", "from" : ["lstm2_fw", "lstm2_bw"] } }
See API or the code itself for documentation of the arguments for each layer class type. The
reclayer class in particular supports a wide range of arguments, and several units which can be used, e.g. you can choose between different LSTM implementations, or GRU, or standard RNN, etc. Seereturnn.tf.layers.rec.RecLayer. See also TensorFlow LSTM Benchmark.See Network Structure for more on how to define the network, and losses.
- batch_size
The total number of frames. A mini-batch has at least a time-dimension and a batch-dimension (or sequence-dimension), and depending on dense or sparse, also a feature-dimension.
batch_sizeis the upper limit formax(seq_lens) * num_seqsduring creation of the mini-batches.- max_seqs
The maximum number of sequences in one mini-batch.
- learning_rate
The learning rate during training, e.g.
0.01.- optimizer
The optimizer to use during training, e.g.
adamorsgd. Can also be a dict to provide additional parameters.- model
Defines the model file where RETURNN will save all model params after an epoch of training. For each epoch, it will suffix the filename by the epoch number. When running
forwardorsearch, the specified model will be loaded. The epoch can then be selected with the parameterload_epoch.- num_epochs
The number of epochs to train.
- log_verbosity
An integer. Common values are 3 or 4. Starting with 5, you will get an output per mini-batch.
There are much more parameters, and more details to many of the listed ones. Details on the parameters can be found in the parameter reference. As the reference is still incomplete, please watch out for additional parameters that can be found in the code.
All configuration params can also be passed as command line parameters.
The generic form is ++param value, but more options are available.
Please See the code for some usage.
See also RETURNN frontend.
See also General Settings.
See Technological Overview for more details and an overview how it all works.
See Training for more about training.