Technological overview

RETURNN is mostly used as a tool where rnn.py (rnn) is the main entry point but you can also use it as a framework / Python module to use in your own Python code. To get an idea about how it works, it helps to follow roughly the execution path starting in rnn, esp. in rnn.main(). In all cases, the code itself should be checked for details and comments.

So far, there are two calculation backends: Theano and TensorFlow, where Theano was the first backend, thus Theano-specific code files are currently not prefix but TensorFlow specific files are prefixed with TF. This is implemented separately for both Theano and TensorFlow:

All the rest is shared for all backends, which mostly is:

Execution guide

Network structure construction

The network structure which defines the model topology is defined by the config network option, which is a dict, where each entry is a layer specification, which itself is a dict containing the kwargs for the specific layer class. E.g.:

network = {
    "fw1": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500},
    "fw2": {"class": "linear", "activation": "relu", "dropout": 0.1, "n_out": 500, "from": ["fw1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["fw2"]}
}

The "class" key will get extracted from the layer arguments and the specific layer class will be used. For Theano, the base layer class is NetworkBaseLayer.Container and NetworkBaseLayer.Layer; for TensorFlow, it is TFNetworkLayer.LayerBase. E.g. that would use the TFNetworkLayer.LinearLayer class, and the LinearLayer.__init__ will accepts arguments like activation. In the given example, all the remaining arguments will get handled by the base layer.

The construction itself can be found for TensorFlow in TFNetwork.TFNetwork.construct_from_dict(), which starts from the output layers goes over the sources of a layer, which are defined by "from". If a layer does not define "from", it will automatically get the input from the dataset data.

Here is a 2 layer unidirectional LSTM network:

network = {
    "lstm1": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500},
    "lstm2": {"class": "rec", "unit": "lstm", "dropout": 0.1, "n_out": 500, "from": ["lstm1"]},
    "output": {"class": "softmax", "loss": "ce", "from": ["lstm2"]}
}

In TensorFlow, that would use the layer class TFNetworkRecLayer.RecLayer which will handle the argument unit.

And here is a 3 layer bidirectional LSTM network:

network = {
"lstm0_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1 },
"lstm0_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1 },

"lstm1_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm0_fw", "lstm0_bw"] },
"lstm1_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm0_fw", "lstm0_bw"] },

"lstm2_fw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": 1, "from" : ["lstm1_fw", "lstm1_bw"] },
"lstm2_bw" : { "class": "rec", "unit": "lstm", "n_out" : 500, "dropout": 0.1, "L2": 0.01, "direction": -1, "from" : ["lstm1_fw", "lstm1_bw"] },

"output" :   { "class" : "softmax", "loss" : "ce", "from" : ["lstm2_fw", "lstm2_bw"] }
}

Training

The engine will loop over the epochs and the individual batches / steps and loads and saves the model. The specific implementation is different in Theano and TensorFlow. See the code for more details, i.e. Engine, EngineTask for Theano and TFEngine for TensorFlow.