Technological Overview¶
RETURNN is a machine learning toolkit that can be used as standalone application or Python framework for training and running sequential neural network architectures.
For an overview of the core concepts behind RETURNN, see the slides of our Interspeech 2020 tutorial about machine learning frameworks including RETURNN.
The main tasks of RETURNN are:
Network construction, i.e. definition of the computation graph
Dataset loading with predefined and extendable
returnn.datasets.DatasetobjectsAutomatic management of layer outputs (such as tensor axes and time dimensions) with a Data object
Support of dynamic training schemes that allow for network structure and parameter changes during training
Managing the losses and optimizer functions
Learning rate scheduling based on training scores
RETURNN supports two calculation backends: TensorFlow and Theano. It is recommended to stick to the TensorFlow backend, as Theano is deprecated.
RETURNN is mostly used as a tool where rnn.py is the main entry point
(see Basic Usage)
but you can also use it as a framework / Python module to use in your own Python code
(see RETURNN as Framework).
To get an idea about how it works, it helps to follow roughly the execution path
starting in returnn.__main__, esp. in returnn.__main__.main().
In all cases, the code itself should be checked for details and comments.
For recent development on RETURNN, see Recent development of RETURNN. If you want to work on the RETURNN code, e.g. for an extension, please see Extending RETURNN.
Structure¶
Many components are implemented separately for both Theano and TensorFlow:
The engine for high-level logic, although a bit is shared.
returnn.theano.enginefor Theano andreturnn.tf.enginefor TensorFlow. For TensorFlow the engine contains the high level methods for training, forward pass, and other executed tasks. It keeps track of the network, devices, models and the updater function, and is the main connection between all these components.returnn.tf.enginealso contains thereturnn.tf.engine.Runnerwhich is responsible for managing the TensorFlow session.Network topology construction which constructs the computation graph for training or just forwarding.
returnn.theano.network,returnn.tf.network.Network model update code for training, i.e. SGD etc.
returnn.theano.updater,returnn.tf.updater.All the individual layer implementations.
returnn.theano.layersfor Theano andreturnn.tf.layersfor TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.Some utilities
returnn.theano.utilandreturnn.tf.util, which contains thereturnn.tensor.Tensorclass.Multi-GPU logic.
returnn.theano.device,returnn.theano.engine_taskfor Theano,returnn.tf.distributed,returnn.tf.horovodfor TensorFlow.
All the rest is shared for all backends, which mostly is:
The main entry point
returnn.__main__.Config handling
returnn.config.Logging
returnn.log.Utilities
returnn.util.Dataset reading
returnn.datasetsincluding all the different dataset implementationsHDFDataset,SprintDataset,LmDataset,GeneratingDataset,MetaDataset, etc.Learning rate scheduling logic such as Newbob
returnn.learning_rate_control.Pretrain network structure construction
returnn.pretrain.The native op code which generates code for ops for both CUDA and CPU shares a common base.
returnn.native_op, where TensorFlow-specific code is inreturnn.tf.native_op.
Execution guide¶
Using RETURNN as a tool, execution consists of calling returnn/rnn.py path/to/my_file.config.
The program follows then the following track:
returnn.__main__.main()will parse command line arguments and read in a config (returnn.config.Config).Then logging (
returnn.log,returnn.log.Log) is initialized, based on verbosity and other settings.Then it initializes the datasets (
train,dev,evalin config), i.e.returnn.datasets.Datasetinstances. See Dataset Input/Output and Datasets.Theano-only:
returnn.theano.device.Deviceinstances.The engine, i.e. a
returnn.tf.engine.Engineinstance.Depending on the
taskoption, some engine initialization which also initializes the network computation graph, Network Construction.Then, depending on the
taskoption, it might startengine.train,engine.forwardetc. (returnn.tf.engine.Engine.train()), Training.
Execution tasks¶
The goal of every execution in RETURNN tries to achieve one of the tasks:
train: Trains the network with the given dataset. It requires at least a valid
traindataset. Ifeval,devoreval_datasetsare specified they are evaluated at the end of each epoch. Further informations can be found inreturnn.tf.engine.Engine.train().eval: Evaluates on
eval,devoreval_datasetsif specified. It requiresload_epochorepochfor loading the weights of the network.search: Performs beam search on the dataset as specified by
search_data. The networks weights are loaded according toload_epochorepoch. The beam size can be specified withbeam_size. For futher information look inreturnn.tf.engine.Engine.search().nop: This task is used to proof check everything not related to the network and the dataset. So datasets and the nework are not initialized at all.
nop_init_net_train: Initializes the network and training dataset
trainbut doesn’t start training.initialize_model: Similiar to nop_init_net_train but it saves a checkpoint at the end.
cleanup_old_models: Cleans up models if we have done some lr control. With
cleanup_old_modelsmore options can be specified.compute_priors: Computes the priors of network outputs for the training dataset.
analyze: Analyses training dataset for the given network. Calculates stuff such as loss, perplexity, ce, frame error, seq length, and prob histograms per batch and for one whole epoch(accumulated).
…
Network Construction¶
The network structure which defines the model topology is defined by the config network option,
which is a dict, where each entry is a layer specification, which itself is a dict containing
the kwargs for the specific layer class. E.g.:
network = {
"fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "data"},
"fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "fw1"},
"output": {"class": "softmax", "loss": "ce", "from": "fw2"}
}
The "class" key will get extracted from the layer arguments and the specific layer class will be used.
For Theano, the base layer class is
returnn.theano.layers.base.Container and returnn.theano.layers.base.Layer;
for TensorFlow, it is returnn.tf.layers.base.LayerBase.
E.g. that would use the returnn.tf.layers.basic.LinearLayer class,
and the LinearLayer.__init__ will accepts arguments like activation.
In the given example, all the remaining arguments will get handled by the base layer.
The construction itself can be found for TensorFlow in returnn.tf.network.TFNetwork.construct_from_dict(),
which starts from the output layers goes over the sources of a layer, which are defined by "from".
If a layer does not define "from", it will automatically get the input from the dataset data.
The network itself is stored in a returnn.tf.network.TFNetwork.
The network, layers, and the dataset make heavy use of returnn.tensor.Tensor,
see Tensor and Dim.
Here is a 2 layer unidirectional LSTM network:
network = {
"lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "data"},
"lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "lstm1"},
"output": {"class": "softmax", "loss": "ce", "from": "lstm2"}
}
In TensorFlow, that would use the layer class returnn.tf.layers.rec.RecLayer
which will handle the argument unit.
See Network Structure for more about the network construction and layer declarations.
See also the next section specifically about recurrency.
Recurrency¶
Recurrency := Anything which is defined by step-by-step execution, where current step depends on previous step, such as RNN, beam search, etc.
This is all covered by returnn.tf.layers.rec.RecLayer,
which is a generic wrapper around tf.while_loop.
It covers:
Definition of stochastic variables (the output classes itself but also latent variables) for either beam search or training (e.g. using ground truth values)
Automatic optimizations
See Recurrency for more details how this works.
Training¶
The engine will loop over the epochs and the individual batches / steps and loads and saves the model.
The specific implementation is different in Theano and TensorFlow.
See the code for more details, i.e. returnn.theano.engine,
returnn.theano.engine_task for Theano
and returnn.tf.engine for TensorFlow.
See Training for an overview of relevant training aspects.