Technological overview

RETURNN is mostly used as a tool where rnn.py (rnn) is the main entry point but you can also use it as a framework / Python module to use in your own Python code. To get an idea about how it works, it helps to follow roughly the execution path starting in rnn, esp. in rnn.main(). In all cases, the code itself should be checked for details and comments.

So far, there are two calculation backends: Theano and TensorFlow, where Theano was the first backend, thus Theano-specific code files are currently not prefix but TensorFlow specific files are prefixed with TF. This is implemented separately for both Theano and TensorFlow:

  • The engine for high-level logic, although a bit is shared. Engine, EngineTask for Theano and TFEngine for TensorFlow.
  • Network topology construction which constructs the computation graph for training or just forwarding. Network, TFNetwork.
  • Network model update code for training, i.e. SGD etc. Updater, TFUpdater.
  • All the individual layer implementations. NetworkLayer, NetworkBaseLayer, NetworkHiddenLayer, NetworkRecurrentLayer etc for Theano and TFNetworkLayer, TFNetworkRecLayer for TensorFlow. This also means that Theano and TensorFlow don’t support the same layers and even parameters can be different.
  • Some utilities TheanoUtil and TFUtil.
  • Multi-GPU logic. Device, EngineTask for Theano and not yet implemented for TensorFlow.

All the rest is shared for all backends, which mostly is:

  • The main entry point rnn.
  • Config handling Config.
  • Logging Log.
  • Utilities Util.
  • Dataset reading Dataset including all the different dataset implementations HDFDataset, SprintDataset, LmDataset, GeneratingDataset, MetaDataset, etc.
  • Learning rate scheduling logic such as Newbob LearningRateControl.
  • Pretrain network structure construction Pretrain.
  • The native op code which generates code for ops for both CUDA and CPU shares a common base. NativeOp, where TensorFlow-specific code is in TFNativeOp.

Execution guide

  • rnn.main() will parse command line arguments and read in a config.
  • Then logging Log is initialized, based on verbosity and other settings.
  • Then it initializes the datasets (train, dev, eval in config), i.e. Dataset instances.
  • Theano-only: Device instances.
  • The engine, i.e. a Engine or TFEngine instance.
  • Depending on the task option, some engine initialization which also initializes the network computation graph, Network structure construction.
  • Then, depending on the task option, it might start engine.train, engine.forward etc. (Engine.Engine.train() or TFEngine.Engine.train()), Training.

Network structure construction

The network structure which defines the model topology is defined by the config network option, which is a dict, where each entry is a layer specification, which itself is a dict containing the kwargs for the specific layer class. E.g.:

network = {
    "fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500},
    "fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": ["fw1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["fw2"]}
}

The "class" key will get extracted from the layer arguments and the specific layer class will be used. For Theano, the base layer class is NetworkBaseLayer.Container and NetworkBaseLayer.Layer; for TensorFlow, it is TFNetworkLayer.LayerBase. E.g. that would use the TFNetworkLayer.LinearLayer class, and the LinearLayer.__init__ will accepts arguments like activation. In the given example, all the remaining arguments will get handled by the base layer.

The construction itself can be found for TensorFlow in TFNetwork.TFNetwork.construct_from_dict(), which starts from the output layers goes over the sources of a layer, which are defined by "from". If a layer does not define "from", it will automatically get the input from the dataset data.

Here is a 2 layer unidirectional LSTM network:

network = {
    "lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500},
    "lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": ["lstm1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["lstm2"]}
}

In TensorFlow, that would use the layer class TFNetworkRecLayer.RecLayer which will handle the argument unit.

And here is a 3 layer bidirectional LSTM network:

network = {
"lstm0_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1 },
"lstm0_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1 },

"lstm1_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm0_fw", "lstm0_bw"] },
"lstm1_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm0_fw", "lstm0_bw"] },

"lstm2_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm1_fw", "lstm1_bw"] },
"lstm2_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm1_fw", "lstm1_bw"] },

"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["lstm2_fw", "lstm2_bw"] }
}

Training

The engine will loop over the epochs and the individual batches / steps and loads and saves the model. The specific implementation is different in Theano and TensorFlow. See the code for more details, i.e. Engine, EngineTask for Theano and TFEngine for TensorFlow.