Technological Overview#
RETURNN is a machine learning toolkit that can be used as standalone application or Python framework for training and running sequential neural network architectures.
For an overview of the core concepts behind RETURNN, see the slides of our Interspeech 2020 tutorial about machine learning frameworks including RETURNN.
The main tasks of RETURNN are:
Network construction, i.e. definition of the computation graph
Dataset loading with predefined and extendable
returnn.datasets.Dataset
objectsAutomatic management of layer outputs (such as tensor axes and time dimensions) with a Data object
Support of dynamic training schemes that allow for network structure and parameter changes during training
Managing the losses and optimizer functions
Learning rate scheduling based on training scores
RETURNN supports two calculation backends: TensorFlow and Theano. It is recommended to stick to the TensorFlow backend, as Theano is deprecated.
RETURNN is mostly used as a tool where rnn.py
is the main entry point
(see Basic Usage)
but you can also use it as a framework / Python module to use in your own Python code
(see RETURNN as Framework).
To get an idea about how it works, it helps to follow roughly the execution path
starting in returnn.__main__
, esp. in returnn.__main__.main()
.
In all cases, the code itself should be checked for details and comments.
For recent development on RETURNN, see Recent development of RETURNN. If you want to work on the RETURNN code, e.g. for an extension, please see Extending RETURNN.
Structure#
Many components are implemented separately for both Theano and TensorFlow:
The engine for high-level logic, although a bit is shared.
returnn.theano.engine
for Theano andreturnn.tf.engine
for TensorFlow. For TensorFlow the engine contains the high level methods for training, forward pass, and other executed tasks. It keeps track of the network, devices, models and the updater function, and is the main connection between all these components.returnn.tf.engine
also contains thereturnn.tf.engine.Runner
which is responsible for managing the TensorFlow session.Network topology construction which constructs the computation graph for training or just forwarding.
returnn.theano.network
,returnn.tf.network
.Network model update code for training, i.e. SGD etc.
returnn.theano.updater
,returnn.tf.updater
.All the individual layer implementations.
returnn.theano.layers
for Theano andreturnn.tf.layers
for TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.Some utilities
returnn.theano.util
andreturnn.tf.util
, which contains thereturnn.tensor.Tensor
class.Multi-GPU logic.
returnn.theano.device
,returnn.theano.engine_task
for Theano,returnn.tf.distributed
,returnn.tf.horovod
for TensorFlow.
All the rest is shared for all backends, which mostly is:
The main entry point
returnn.__main__
.Config handling
returnn.config
.Logging
returnn.log
.Utilities
returnn.util
.Dataset reading
returnn.datasets
including all the different dataset implementationsHDFDataset
,SprintDataset
,LmDataset
,GeneratingDataset
,MetaDataset
, etc.Learning rate scheduling logic such as Newbob
returnn.learning_rate_control
.Pretrain network structure construction
returnn.pretrain
.The native op code which generates code for ops for both CUDA and CPU shares a common base.
returnn.native_op
, where TensorFlow-specific code is inreturnn.tf.native_op
.
Execution guide#
Using RETURNN as a tool, execution consists of calling returnn/rnn.py path/to/my_file.config
.
The program follows then the following track:
returnn.__main__.main()
will parse command line arguments and read in a config (returnn.config.Config
).Then logging (
returnn.log
,returnn.log.Log
) is initialized, based on verbosity and other settings.Then it initializes the datasets (
train
,dev
,eval
in config), i.e.returnn.datasets.Dataset
instances. See Dataset Input/Output and Datasets.Theano-only:
returnn.theano.device.Device
instances.The engine, i.e. a
returnn.tf.engine.Engine
instance.Depending on the
task
option, some engine initialization which also initializes the network computation graph, Network Construction.Then, depending on the
task
option, it might startengine.train
,engine.forward
etc. (returnn.tf.engine.Engine.train()
), Training.
Execution tasks#
The goal of every execution in RETURNN tries to achieve one of the tasks:
train: Trains the network with the given dataset. It requires at least a valid
train
dataset. Ifeval
,dev
oreval_datasets
are specified they are evaluated at the end of each epoch. Further informations can be found inreturnn.tf.engine.Engine.train()
.eval: Evaluates on
eval
,dev
oreval_datasets
if specified. It requiresload_epoch
orepoch
for loading the weights of the network.search: Performs beam search on the dataset as specified by
search_data
. The networks weights are loaded according toload_epoch
orepoch
. The beam size can be specified withbeam_size
. For futher information look inreturnn.tf.engine.Engine.search()
.nop: This task is used to proof check everything not related to the network and the dataset. So datasets and the nework are not initialized at all.
nop_init_net_train: Initializes the network and training dataset
train
but doesn’t start training.initialize_model: Similiar to nop_init_net_train but it saves a checkpoint at the end.
cleanup_old_models: Cleans up models if we have done some lr control. With
cleanup_old_models
more options can be specified.compute_priors: Computes the priors of network outputs for the training dataset.
analyze: Analyses training dataset for the given network. Calculates stuff such as loss, perplexity, ce, frame error, seq length, and prob histograms per batch and for one whole epoch(accumulated).
…
Network Construction#
The network structure which defines the model topology is defined by the config network
option,
which is a dict, where each entry is a layer specification, which itself is a dict containing
the kwargs for the specific layer class. E.g.:
network = {
"fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "data"},
"fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": "fw1"},
"output": {"class": "softmax", "loss": "ce", "from": "fw2"}
}
The "class"
key will get extracted from the layer arguments and the specific layer class will be used.
For Theano, the base layer class is
returnn.theano.layers.base.Container
and returnn.theano.layers.base.Layer
;
for TensorFlow, it is returnn.tf.layers.base.LayerBase
.
E.g. that would use the returnn.tf.layers.basic.LinearLayer
class,
and the LinearLayer.__init__
will accepts arguments like activation
.
In the given example, all the remaining arguments will get handled by the base layer.
The construction itself can be found for TensorFlow in returnn.tf.network.TFNetwork.construct_from_dict()
,
which starts from the output layers goes over the sources of a layer, which are defined by "from"
.
If a layer does not define "from"
, it will automatically get the input from the dataset data.
The network itself is stored in a returnn.tf.network.TFNetwork
.
The network, layers, and the dataset make heavy use of returnn.tensor.Tensor
,
see Tensor and Dim.
Here is a 2 layer unidirectional LSTM network:
network = {
"lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "data"},
"lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": "lstm1"},
"output": {"class": "softmax", "loss": "ce", "from": "lstm2"}
}
In TensorFlow, that would use the layer class returnn.tf.layers.rec.RecLayer
which will handle the argument unit
.
See Network Structure for more about the network construction and layer declarations.
See also the next section specifically about recurrency.
Recurrency#
Recurrency := Anything which is defined by step-by-step execution, where current step depends on previous step, such as RNN, beam search, etc.
This is all covered by returnn.tf.layers.rec.RecLayer
,
which is a generic wrapper around tf.while_loop
.
It covers:
Definition of stochastic variables (the output classes itself but also latent variables) for either beam search or training (e.g. using ground truth values)
Automatic optimizations
See Recurrency for more details how this works.
Training#
The engine will loop over the epochs and the individual batches / steps and loads and saves the model.
The specific implementation is different in Theano and TensorFlow.
See the code for more details, i.e. returnn.theano.engine
,
returnn.theano.engine_task
for Theano
and returnn.tf.engine
for TensorFlow.
See Training for an overview of relevant training aspects.