Recurrent Units

These are the units that can be used in a returnn.tf.layers.rec.RecLayer type of layer. Common units are:

  • BasicLSTM (the cell), via official TF, pure TF implementation
  • LSTMBlock (the cell), via tf.contrib.rnn, only TF <=1
  • LSTMBlockFused, via tf.contrib.rnn. should be much faster than BasicLSTM. only TF <=1.
  • CudnnLSTM, via tf.contrib.cudnn_rnn. This is experimental yet.
  • NativeLSTM, our own native LSTM. should be faster than LSTMBlockFused, and similar or faster than CudnnLSTM
  • NativeLstm2, improved own native LSTM, should be the fastest and most powerful

Note that the native implementations can not be in a recurrent subnetwork, as they process the whole sequence at once. A performance comparison of the different LSTM Layers is available here.

BasicLSTMCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]

DEPRECATED: Please use tf.compat.v1.nn.rnn_cell.LSTMCell instead.

Basic LSTM recurrent network cell.

The implementation is based on

We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.

It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.

For advanced models, please use the full tf.compat.v1.nn.rnn_cell.LSTMCell that follows.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU.

Initialize the basic LSTM cell.

Args:

num_units: int, The number of units in the LSTM cell. forget_bias: float, The bias added to forget gates (see above). Must set

to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
state_is_tuple: If True, accepted and returned states are 2-tuples of the
c_state and m_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: tanh. It
could also be string that is within Keras activation function names.
reuse: (optional) Python boolean describing whether to reuse variables in
an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share
weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of
the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like
trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTM-trained checkpoints, must use CudnnCompatibleLSTMCell instead.
state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]
call(inputs, state)[source]

Long short-term memory cell (LSTM).

Args:

inputs: 2-D tensor with shape [batch_size, input_size]. state: An LSTMStateTuple of state tensors, each shaped `[batch_size,

num_units]`, if state_is_tuple has been set to True. Otherwise, a Tensor shaped [batch_size, 2 * num_units].
Returns:
A pair containing the new hidden state, and the new state (either a
LSTMStateTuple or a concatenated state, depending on state_is_tuple).
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:
Python dictionary.

BasicRNNCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.BasicRNNCell(num_units, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]

The most basic RNN cell.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnRNNTanh for better performance on GPU.

Args:

num_units: int, The number of units in the RNN cell. activation: Nonlinearity to use. Default: tanh. It could also be string

that is within Keras activation function names.
reuse: (optional) Python boolean describing whether to reuse variables in an
existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share
weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of
the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like
trainable etc when constructing the cell from configs of get_config().
state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]
call(inputs, state)[source]

Most basic RNN: output = new_state = act(W * input + U * state + B).

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:
Python dictionary.

BlocksparseLSTMCell

class returnn.tf.layers.rec.BlocksparseLSTMCell(*args, **kwargs)[source]

Standard LSTM but uses OpenAI blocksparse kernels to support bigger matrices.

Refs:

It uses our own wrapper, see TFNativeOp.init_blocksparse().

call(*args, **kwargs)[source]
Parameters:
  • args – passed to super
  • kwargs – passed to super
Return type:

tf.Tensor|tuple[tf.Tensor]

load_params_from_native_lstm(values_dict, session)[source]
Parameters:
  • session (tf.compat.v1.Session) –
  • values_dict (dict[str,numpy.ndarray]) –

BlocksparseMultiplicativeMultistepLSTMCell

class returnn.tf.layers.rec.BlocksparseMultiplicativeMultistepLSTMCell(*args, **kwargs)[source]

Multiplicative LSTM with multiple steps, as in the OpenAI blocksparse paper. Uses OpenAI blocksparse kernels to support bigger matrices.

Refs:

call(*args, **kwargs)[source]
Return type:tf.Tensor

GRUCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.GRUCell(num_units, activation=None, reuse=None, kernel_initializer=None, bias_initializer=None, name=None, dtype=None, **kwargs)[source]

Gated Recurrent Unit cell.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnGRU for better performance on GPU, or tf.contrib.rnn.GRUBlockCellV2 for better performance on CPU.

Args:

num_units: int, The number of units in the GRU cell. activation: Nonlinearity to use. Default: tanh. reuse: (optional) Python boolean describing whether to reuse variables in an

existing scope. If not True, and the existing scope already has the given variables, an error is raised.
kernel_initializer: (optional) The initializer to use for the weight and
projection matrices.

bias_initializer: (optional) The initializer to use for the bias. name: String, the name of the layer. Layers with the same name will share

weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of
the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like

trainable etc when constructing the cell from configs of get_config().

References:

Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation:

state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]
call(inputs, state)[source]

Gated recurrent unit (GRU) with nunits cells.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:
Python dictionary.

LayerNormVariantsLSTMCell

class returnn.tf.layers.rec.LayerNormVariantsLSTMCell(num_units, norm_gain=1.0, norm_shift=0.0, forget_bias=0.0, activation=<function tanh>, is_training=None, dropout=0.0, dropout_h=0.0, dropout_seed=None, with_concat=False, global_norm=True, global_norm_joined=False, per_gate_norm=False, cell_norm=True, cell_norm_in_output=True, hidden_norm=False, variance_epsilon=1e-12)[source]

LSTM unit with layer normalization and recurrent dropout

This LSTM cell can apply different variants of layer normalization:

1. Layer normalization as in the original paper: Ref: https://arxiv.org/abs/1607.06450 This can be applied by having:

all default params (global_norm=True, cell_norm=True, cell_norm_in_output=True)

2. Layer normalization for RNMT+: Ref: https://arxiv.org/abs/1804.09849 This can be applied by having:

all default params except - global_norm = False - per_gate_norm = True - cell_norm_in_output = False

3. TF official LayerNormBasicLSTMCell Ref: https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LayerNormBasicLSTMCell This can be reproduced by having:

all default params except - global_norm = False - per_gate_norm = True

4. Sockeye LSTM layer normalization implementations Ref: https://github.com/awslabs/sockeye/blob/master/sockeye/rnn.py

LayerNormLSTMCell can be reproduced by having:
all default params except - with_concat = False (just efficiency, no difference in the model)
LayerNormPerGateLSTMCell can be reproduced by having:
all default params except: (- with_concat = False) - global_norm = False - per_gate_norm = True
Recurrent dropout is based on:
https://arxiv.org/abs/1603.05118

Prohibited LN combinations: - global_norm and global_norm_joined both enabled - per_gate_norm with global_norm or global_norm_joined

Parameters:
  • num_units (int) – number of lstm units
  • norm_gain (float) – layer normalization gain value
  • norm_shift (float) – layer normalization shift (bias) value
  • forget_bias (float) – the bias added to forget gates
  • activation – Activation function to be applied in the lstm cell
  • is_training (bool) – if True then we are in the training phase
  • dropout (float) – dropout rate, applied on cell-in (j)
  • dropout_h (float) – dropout rate, applied on hidden state (h) when it enters the LSTM (variational dropout)
  • dropout_seed (int) – used to create random seeds
  • with_concat (bool) – if True then the input and prev hidden state is concatenated for the computation. this is just about computation performance.
  • global_norm (bool) – if True then layer normalization is applied for the forward and recurrent outputs (separately).
  • global_norm_joined (bool) – if True, then layer norm is applied on LSTM in (forward and recurrent output together)
  • per_gate_norm (bool) – if True then layer normalization is applied per lstm gate
  • cell_norm (bool) – if True then layer normalization is applied to the LSTM new cell output
  • cell_norm_in_output (bool) – if True, the normalized cell is also used in the output
  • hidden_norm (bool) – if True then layer normalization is applied to the LSTM new hidden state output
output_size[source]
Return type:int
state_size[source]
Return type:rnn_cell.LSTMStateTuple
get_input_transformed(inputs, batch_dim=None)[source]
Parameters:
  • inputs (tf.Tensor) –
  • batch_dim (tf.Tensor|None) –
Return type:

tf.Tensor

LayerRNNCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.LayerRNNCell(trainable=True, name=None, dtype=None, **kwargs)[source]

Subclass of RNNCells that act like proper tf.Layer objects.

For backwards compatibility purposes, most RNNCell instances allow their call methods to instantiate variables via tf.compat.v1.get_variable. The underlying variable scope thus keeps track of any variables, and returning cached versions. This is atypical of tf.layer objects, which separate this part of layer building into a build method that is only called once.

Here we provide a subclass for RNNCell objects that act exactly as Layer objects do. They must provide a build method and their call methods do not access Variables tf.compat.v1.get_variable.

NativeLstmCell

class returnn.tf.native_op.NativeLstmCell(forget_bias=0.0, **kwargs)[source]

Native LSTM.

Parameters:forget_bias (float) –
classmethod map_layer_inputs_to_op(z, rec_weights, i, initial_state=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().

Parameters:
  • z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
  • rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
  • i (tf.Tensor) – index: shape (time,batch)
  • initial_state (tf.Tensor|None) – shape (batch,n_hidden)
Return type:

(tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

MultiRNNCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.MultiRNNCell(cells, state_is_tuple=True)[source]

RNN cell composed sequentially of multiple simple cells.

Example:

`python num_units = [128, 64] cells = [BasicLSTMCell(num_units=n) for n in num_units] stacked_rnn_cell = MultiRNNCell(cells) `

Create a RNN cell composed sequentially of a number of RNNCells.

Args:

cells: list of RNNCells that will be composed in this order. state_is_tuple: If True, accepted and returned states are n-tuples, where

n = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.
Raises:
ValueError: if cells is empty (not allowed), or at least one of the cells
returns a state tuple but the flag state_is_tuple is False.
state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

zero_state(batch_size, dtype)[source]

Return zero-filled state tensor(s).

Args:
batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.
Returns:

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.

trainable_weights[source]

List of all trainable weights tracked by this layer.

Trainable weights are updated via gradient descent during training.

Returns:
A list of trainable variables.
non_trainable_weights[source]

List of all non-trainable weights tracked by this layer.

Non-trainable weights are not updated during training. They are expected to be updated manually in call().

Returns:
A list of non-trainable variables.
call(inputs, state)[source]

Run this multi-layer cell on inputs, starting from state.

NativeLstmCell

class returnn.tf.native_op.NativeLstmCell(forget_bias=0.0, **kwargs)[source]

Native LSTM.

Parameters:forget_bias (float) –
classmethod map_layer_inputs_to_op(z, rec_weights, i, initial_state=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().

Parameters:
  • z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
  • rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
  • i (tf.Tensor) – index: shape (time,batch)
  • initial_state (tf.Tensor|None) – shape (batch,n_hidden)
Return type:

(tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

NativeLstmLowMemCell

class returnn.tf.native_op.NativeLstmLowMemCell(**kwargs)[source]

Native LSTM, low mem variant.

does_input_projection = True[source]
does_direction_handling = True[source]
map_layer_inputs_to_op(x, weights, b, i, initial_state=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor x: inputs: shape (time,batch,n_input_dim) :param tf.Tensor weights: shape (n_input_dim+n_hidden,n_hidden*4) :param tf.Tensor b: shape (n_hidden*4,) :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor|None initial_state: shape (batch,n_hidden) :rtype: tuple[tf.Tensor]

RHNCell

class returnn.tf.layers.rec.RHNCell(num_units, is_training=None, depth=5, dropout=0.0, dropout_seed=None, transform_bias=None, batch_size=None)[source]

Recurrent Highway Layer. With optional dropout for recurrent state (fixed over all frames - some call this variational).

References:
https://github.com/julian121266/RecurrentHighwayNetworks/ https://arxiv.org/abs/1607.03474
Parameters:
  • num_units (int) –
  • is_training (bool|tf.Tensor|None) –
  • depth (int) –
  • dropout (float) –
  • dropout_seed (int) –
  • transform_bias (float|None) –
  • batch_size (int|tf.Tensor|None) –
output_size[source]
Return type:int
state_size[source]
Return type:int
get_input_transformed(x, batch_dim=None)[source]
Parameters:
  • x (tf.Tensor) – (time, batch, dim)
  • batch_dim (tf.Tensor|None) –
Returns:

(time, batch, num_units * 2)

Return type:

tf.Tensor

call(inputs, state)[source]
Parameters:
  • inputs (tf.Tensor) –
  • state (tf.Tensor) –
Returns:

(output, state)

Return type:

(tf.Tensor, tf.Tensor)

RNNCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.RNNCell(trainable=True, name=None, dtype=None, **kwargs)[source]

Abstract object representing an RNN cell.

Every RNNCell must have the properties below and implement call with the signature (output, next_state) = call(input, state). The optional third input argument, scope, is allowed for backwards compatibility purposes; but should be left off for new subclasses.

This definition of cell differs from the definition used in the literature. In the literature, ‘cell’ refers to an object with a single scalar output. This definition refers to a horizontal array of such units.

An RNN cell, in the most abstract setting, is anything that has a state and performs some operation that takes a matrix of inputs. This operation results in an output matrix with self.output_size columns. If self.state_size is an integer, this operation also results in a new state matrix with self.state_size columns. If self.state_size is a (possibly nested tuple of) TensorShape object(s), then it should return a matching structure of Tensors having shape [batch_size].concatenate(s) for each s in self.batch_size.

state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

build(_)[source]

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:
input_shape: Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs (one instance per input).
get_initial_state(inputs=None, batch_size=None, dtype=None)[source]
zero_state(batch_size, dtype)[source]

Return zero-filled state tensor(s).

Args:
batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.
Returns:

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:
Python dictionary.

LSTMCell

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.LSTMCell(num_units, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]

Long short-term memory unit (LSTM) recurrent network cell.

The default non-peephole implementation is based on (Gers et al., 1999). The peephole implementation is based on (Sak et al., 2014).

The class uses optional peep-hole connections, optional cell clipping, and an optional projection layer.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU. References:

Long short-term memory recurrent neural network architectures for large scale acoustic modeling:

Learning to forget:
[Gers et al., 1999] (http://digital-library.theiet.org/content/conferences/10.1049/cp_19991218) ([pdf](https://arxiv.org/pdf/1409.2329.pdf))
Long Short-Term Memory:
[Hochreiter et al., 1997] (https://www.mitpressjournals.org/doi/abs/10.1162/neco.1997.9.8.1735) ([pdf](http://ml.jku.at/publications/older/3504.pdf))

Initialize the parameters for an LSTM cell.

Args:

num_units: int, The number of units in the LSTM cell. use_peepholes: bool, set True to enable diagonal/peephole connections. cell_clip: (optional) A float value, if provided the cell state is clipped

by this value prior to the cell output activation.
initializer: (optional) The initializer to use for the weight and
projection matrices.
num_proj: (optional) int, The output dimensionality for the projection
matrices. If None, no projection is performed.
proj_clip: (optional) A float value. If num_proj > 0 and proj_clip is
provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip].
num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a
variable_scope partitioner instead.
num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a
variable_scope partitioner instead.
forget_bias: Biases of the forget gate are initialized by default to 1 in
order to reduce the scale of forgetting at the beginning of the training. Must set it manually to 0.0 when restoring from CudnnLSTM trained checkpoints.
state_is_tuple: If True, accepted and returned states are 2-tuples of the
c_state and m_state. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: tanh. It
could also be string that is within Keras activation function names.
reuse: (optional) Python boolean describing whether to reuse variables in
an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share
weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of
the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like
trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTM-trained checkpoints, use CudnnCompatibleLSTMCell instead.
state_size[source]

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

output_size[source]

Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]
call(inputs, state)[source]

Run one step of LSTM.

Args:

inputs: input Tensor, must be 2-D, [batch, input_size]. state: if state_is_tuple is False, this must be a state Tensor, `2-D,

[batch, state_size]`. If state_is_tuple is True, this must be a tuple of state Tensors, both 2-D, with column sizes c_state and m_state.
Returns:

A tuple containing:

  • A 2-D, [batch, output_dim], Tensor representing the output of the LSTM after reading inputs when previous state was state. Here output_dim is:

    num_proj if num_proj was set, num_units otherwise.

  • Tensor(s) representing the new state of LSTM after reading inputs when the previous state was state. Same type and shape(s) as state.

Raises:
ValueError: If input size cannot be inferred from inputs via
static shape inference.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:
Python dictionary.

TwoDNativeLstmCell

class returnn.tf.native_op.TwoDNativeLstmCell(pooling, **kwargs)[source]

Native 2D LSTM.

does_input_projection = True[source]
classmethod map_layer_inputs_to_op(X, V_h, V_v, W, i, previous_state=None, previous_output=None, iteration=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor X: inputs: shape (timeT,timeS,batch,n_hidden*5) :param tf.Tensor V_h: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor V_v: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor W: :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor previous_state: :param tf.Tensor previous_output: :param tf.Tensor iteration: :rtype: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

ZoneoutLSTMCell

class returnn.tf.layers.rec.ZoneoutLSTMCell(num_units, zoneout_factor_cell=0.0, zoneout_factor_output=0.0)[source]

Wrapper for tf LSTM to create Zoneout LSTM Cell. This code is an adapted version of Rayhane Mamas version of Tacotron-2

Refs:

Initializer with possibility to set different zoneout values for cell/hidden states.

Parameters:
  • num_units (int) – number of hidden units
  • zoneout_factor_cell (float) – cell zoneout factor
  • zoneout_factor_output (float) – output zoneout factor
state_size[source]
Return type:int
output_size[source]
Return type:int