Recurrent Units¶

These are the units that can be used in a returnn.tf.layers.rec.RecLayer type of layer. Common units are:

BasicLSTM (the cell), via official TF, pure TF implementation

LSTMBlock (the cell), via tf.contrib.rnn, only TF <=1

LSTMBlockFused, via tf.contrib.rnn. should be much faster than BasicLSTM. only TF <=1.

CudnnLSTM, via tf.contrib.cudnn_rnn. This is experimental yet.

NativeLSTM, our own native LSTM. should be faster than LSTMBlockFused, and similar or faster than CudnnLSTM

NativeLstm2, improved own native LSTM, should be the fastest and most powerful

Note that the native implementations can not be in a recurrent subnetwork, as they process the whole sequence at once. A performance comparison of the different LSTM Layers is available here.

BasicLSTMCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.BasicLSTMCell(*args, **kwargs)[source]¶

DEPRECATED: Please use tf.compat.v1.nn.rnn_cell.LSTMCell instead.

Basic LSTM recurrent network cell.

The implementation is based on

We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.

It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.

For advanced models, please use the full tf.compat.v1.nn.rnn_cell.LSTMCell that follows.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU.

Initialize the basic LSTM cell.

Args:

num_units: int, The number of units in the LSTM cell. forget_bias: float, The bias added to forget gates (see above). Must set

to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.

state_is_tuple: If True, accepted and returned states are 2-tuples of the: c_state and m_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: tanh. It: could also be string that is within Keras activation function names.
reuse: (optional) Python boolean describing whether to reuse variables in: an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share: weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of: the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like: trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTM-trained checkpoints, must use CudnnCompatibleLSTMCell instead.

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]¶

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:

input_shape: Instance of TensorShape, or list of instances of: TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs, state)[source]¶

Long short-term memory cell (LSTM).

Args:

inputs: 2-D tensor with shape [batch_size, input_size]. state: An LSTMStateTuple of state tensors, each shaped `[batch_size,

num_units]`, if state_is_tuple has been set to True. Otherwise, a Tensor shaped [batch_size, 2 * num_units].

Returns:

A pair containing the new hidden state, and the new state (either a: LSTMStateTuple or a concatenated state, depending on state_is_tuple).

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

BasicRNNCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.BasicRNNCell(*args, **kwargs)[source]¶

The most basic RNN cell.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnRNNTanh for better performance on GPU.

Args:

num_units: int, The number of units in the RNN cell. activation: Nonlinearity to use. Default: tanh. It could also be string

that is within Keras activation function names.

reuse: (optional) Python boolean describing whether to reuse variables in an: existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share: weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of: the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like: trainable etc when constructing the cell from configs of get_config().

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]¶

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:

input_shape: Instance of TensorShape, or list of instances of: TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs, state)[source]¶: Most basic RNN: output = new_state = act(W * input + U * state + B).

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

BlocksparseLSTMCell¶

class returnn.tf.layers.rec.BlocksparseLSTMCell(*args, **kwargs)[source]¶

Standard LSTM but uses OpenAI blocksparse kernels to support bigger matrices.

Refs:

https://blog.openai.com/block-sparse-gpu-kernels/ https://github.com/openai/blocksparse https://s3-us-west-2.amazonaws.com/openai-assets/blocksparse/blocksparsepaper.pdf

It uses our own wrapper, see TFNativeOp.init_blocksparse().

Parameters:: num_units (int)

call(*args, **kwargs)[source]¶

Parameters:

args – passed to super
kwargs – passed to super

Return type:

tf.Tensor|tuple[tf.Tensor]

load_params_from_native_lstm(values_dict, session)[source]¶

Parameters:

session (tf.compat.v1.Session)
values_dict (dict[str,numpy.ndarray])

BlocksparseMultiplicativeMultistepLSTMCell¶

class returnn.tf.layers.rec.BlocksparseMultiplicativeMultistepLSTMCell(*args, **kwargs)[source]¶

Multiplicative LSTM with multiple steps, as in the OpenAI blocksparse paper. Uses OpenAI blocksparse kernels to support bigger matrices.

Refs:

https://blog.openai.com/block-sparse-gpu-kernels/ https://github.com/openai/blocksparse https://s3-us-west-2.amazonaws.com/openai-assets/blocksparse/blocksparsepaper.pdf

Parameters:: num_units (int)

call(*args, **kwargs)[source]¶

Return type:: tf.Tensor

GRUCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.GRUCell(*args, **kwargs)[source]¶

Gated Recurrent Unit cell.

Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnGRU for better performance on GPU, or tf.contrib.rnn.GRUBlockCellV2 for better performance on CPU.

Args:

num_units: int, The number of units in the GRU cell. activation: Nonlinearity to use. Default: tanh. reuse: (optional) Python boolean describing whether to reuse variables in an

existing scope. If not True, and the existing scope already has the given variables, an error is raised.

kernel_initializer: (optional) The initializer to use for the weight and: projection matrices.

bias_initializer: (optional) The initializer to use for the bias. name: String, the name of the layer. Layers with the same name will share

weights, but to avoid mistakes we require reuse=True in such cases.

dtype: Default dtype of the layer (default of None means use the type of

the first input). Required when build is called before call.

**kwargs: Dict, keyword named properties for common layer attributes, like

trainable etc when constructing the cell from configs of get_config().

References:

Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation:

[Cho et al., 2014] (https://aclanthology.coli.uni-saarland.de/papers/D14-1179/d14-1179) ([pdf](http://emnlp2014.org/papers/pdf/EMNLP2014179.pdf))

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]¶

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:

input_shape: Instance of TensorShape, or list of instances of: TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs, state)[source]¶: Gated recurrent unit (GRU) with nunits cells.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

LayerNormVariantsLSTMCell¶

class returnn.tf.layers.rec.LayerNormVariantsLSTMCell(*args, **kwargs)[source]¶

LSTM unit with layer normalization and recurrent dropout

This LSTM cell can apply different variants of layer normalization:

1. Layer normalization as in the original paper: Ref: https://arxiv.org/abs/1607.06450 This can be applied by having:

all default params (global_norm=True, cell_norm=True, cell_norm_in_output=True)

2. Layer normalization for RNMT+: Ref: https://arxiv.org/abs/1804.09849 This can be applied by having:

all default params except - global_norm = False - per_gate_norm = True - cell_norm_in_output = False

3. TF official LayerNormBasicLSTMCell Ref: https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LayerNormBasicLSTMCell This can be reproduced by having:

all default params except - global_norm = False - per_gate_norm = True

4. Sockeye LSTM layer normalization implementations Ref: https://github.com/awslabs/sockeye/blob/master/sockeye/rnn.py

LayerNormLSTMCell can be reproduced by having:: all default params except - with_concat = False (just efficiency, no difference in the model)
LayerNormPerGateLSTMCell can be reproduced by having:: all default params except: (- with_concat = False) - global_norm = False - per_gate_norm = True
Recurrent dropout is based on:: https://arxiv.org/abs/1603.05118

Prohibited LN combinations: - global_norm and global_norm_joined both enabled - per_gate_norm with global_norm or global_norm_joined

Parameters:

num_units (int) – number of lstm units
norm_gain (float) – layer normalization gain value
norm_shift (float) – layer normalization shift (bias) value
forget_bias (float) – the bias added to forget gates
activation – Activation function to be applied in the lstm cell
is_training (bool) – if True then we are in the training phase
dropout (float) – dropout rate, applied on cell-in (j)
dropout_h (float) – dropout rate, applied on hidden state (h) when it enters the LSTM (variational dropout)
dropout_seed (int) – used to create random seeds
with_concat (bool) – if True then the input and prev hidden state is concatenated for the computation. this is just about computation performance.
global_norm (bool) – if True then layer normalization is applied for the forward and recurrent outputs (separately).
global_norm_joined (bool) – if True, then layer norm is applied on LSTM in (forward and recurrent output together)
per_gate_norm (bool) – if True then layer normalization is applied per lstm gate
cell_norm (bool) – if True then layer normalization is applied to the LSTM new cell output
cell_norm_in_output (bool) – if True, the normalized cell is also used in the output
hidden_norm (bool) – if True then layer normalization is applied to the LSTM new hidden state output

property output_size[source]¶

Return type:: int

property state_size[source]¶

Return type:: rnn_cell.LSTMStateTuple

get_input_transformed(inputs, batch_dim=None)[source]¶

Parameters:

inputs (tf.Tensor)
batch_dim (tf.Tensor|None)

Return type:

tf.Tensor

LayerRNNCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.LayerRNNCell(*args, **kwargs)[source]¶

Subclass of RNNCells that act like proper tf.Layer objects.

For backwards compatibility purposes, most RNNCell instances allow their call methods to instantiate variables via tf.compat.v1.get_variable. The underlying variable scope thus keeps track of any variables, and returning cached versions. This is atypical of tf.layer objects, which separate this part of layer building into a build method that is only called once.

Here we provide a subclass for RNNCell objects that act exactly as Layer objects do. They must provide a build method and their call methods do not access Variables tf.compat.v1.get_variable.

MultiRNNCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.MultiRNNCell(*args, **kwargs)[source]¶

RNN cell composed sequentially of multiple simple cells.

Example:

`python num_units = [128, 64] cells = [BasicLSTMCell(num_units=n) for n in num_units] stacked_rnn_cell = MultiRNNCell(cells) `

Create a RNN cell composed sequentially of a number of RNNCells.

Args:

cells: list of RNNCells that will be composed in this order. state_is_tuple: If True, accepted and returned states are n-tuples, where

n = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.

Raises:

ValueError: if cells is empty (not allowed), or at least one of the cells: returns a state tuple but the flag state_is_tuple is False.

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

zero_state(batch_size, dtype)[source]¶

Return zero-filled state tensor(s).

Args:

batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.

Returns:

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.

property trainable_weights[source]¶

List of all trainable weights tracked by this layer.

Trainable weights are updated via gradient descent during training.

Returns:: A list of trainable variables.

property non_trainable_weights[source]¶

List of all non-trainable weights tracked by this layer.

Non-trainable weights are not updated during training. They are expected to be updated manually in call().

Returns:: A list of non-trainable variables.

call(inputs, state)[source]¶: Run this multi-layer cell on inputs, starting from state.

NativeLstmCell¶

class returnn.tf.native_op.NativeLstmCell(forget_bias=0.0, **kwargs)[source]¶

Native LSTM.

Parameters:: forget_bias (float)

classmethod map_layer_inputs_to_op(z, rec_weights, i, initial_state=None)[source]¶

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().

Parameters:

z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
i (tf.Tensor) – index: shape (time,batch)
initial_state (tf.Tensor|None) – shape (batch,n_hidden)

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

NativeLstmLowMemCell¶

class returnn.tf.native_op.NativeLstmLowMemCell(**kwargs)[source]¶

Native LSTM, low mem variant.

Parameters:

n_hidden (int)
n_input_dim (int)
n_input_dim_parts (int|list[int])
input_is_sparse (bool)
step (int) – what direction and step to use

does_input_projection = True[source]¶

does_direction_handling = True[source]¶

map_layer_inputs_to_op(x, weights, b, i, initial_state=None)[source]¶: Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor x: inputs: shape (time,batch,n_input_dim) :param tf.Tensor weights: shape (n_input_dim+n_hidden,n_hidden*4) :param tf.Tensor b: shape (n_hidden*4,) :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor|None initial_state: shape (batch,n_hidden) :rtype: tuple[tf.Tensor]

RHNCell¶

class returnn.tf.layers.rec.RHNCell(*args, **kwargs)[source]¶

Recurrent Highway Layer. With optional dropout for recurrent state (fixed over all frames - some call this variational).

References:: https://github.com/julian121266/RecurrentHighwayNetworks/ https://arxiv.org/abs/1607.03474

Parameters:

num_units (int)
is_training (bool|tf.Tensor|None)
depth (int)
dropout (float)
dropout_seed (int)
transform_bias (float|None)
batch_size (int|tf.Tensor|None)

property output_size[source]¶

Return type:: int

property state_size[source]¶

Return type:: int

get_input_transformed(x, batch_dim=None)[source]¶

Parameters:

x (tf.Tensor) – (time, batch, dim)
batch_dim (tf.Tensor|None)

Returns:

(time, batch, num_units * 2)

Return type:

tf.Tensor

call(inputs, state)[source]¶

Parameters:

inputs (tf.Tensor)
state (tf.Tensor)

Returns:

(output, state)

Return type:

(tf.Tensor, tf.Tensor)

RNNCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.RNNCell(*args, **kwargs)[source]¶

Abstract object representing an RNN cell.

Every RNNCell must have the properties below and implement call with the signature (output, next_state) = call(input, state). The optional third input argument, scope, is allowed for backwards compatibility purposes; but should be left off for new subclasses.

This definition of cell differs from the definition used in the literature. In the literature, ‘cell’ refers to an object with a single scalar output. This definition refers to a horizontal array of such units.

An RNN cell, in the most abstract setting, is anything that has a state and performs some operation that takes a matrix of inputs. This operation results in an output matrix with self.output_size columns. If self.state_size is an integer, this operation also results in a new state matrix with self.state_size columns. If self.state_size is a (possibly nested tuple of) TensorShape object(s), then it should return a matching structure of Tensors having shape [batch_size].concatenate(s) for each s in self.batch_size.

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

build(_)[source]¶

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:

input_shape: Instance of TensorShape, or list of instances of: TensorShape if the layer expects a list of inputs (one instance per input).

get_initial_state(inputs=None, batch_size=None, dtype=None)[source]¶

zero_state(batch_size, dtype)[source]¶

Return zero-filled state tensor(s).

Args:

batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.

Returns:

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

LSTMCell¶

class tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.LSTMCell(*args, **kwargs)[source]¶

Long short-term memory unit (LSTM) recurrent network cell.

The default non-peephole implementation is based on (Gers et al., 1999). The peephole implementation is based on (Sak et al., 2014).

The class uses optional peep-hole connections, optional cell clipping, and an optional projection layer.

Long short-term memory recurrent neural network architectures for large scale acoustic modeling:

[Sak et al., 2014] (https://www.isca-speech.org/archive/interspeech_2014/i14_0338.html) ([pdf] (https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_0338.pdf))

Learning to forget:
[Gers et al., 1999] (http://digital-library.theiet.org/content/conferences/10.1049/cp_19991218) ([pdf](https://arxiv.org/pdf/1409.2329.pdf))

Long Short-Term Memory:
[Hochreiter et al., 1997] (https://www.mitpressjournals.org/doi/abs/10.1162/neco.1997.9.8.1735) ([pdf](http://ml.jku.at/publications/older/3504.pdf))

Initialize the parameters for an LSTM cell.

Args:

num_units: int, The number of units in the LSTM cell. use_peepholes: bool, set True to enable diagonal/peephole connections. cell_clip: (optional) A float value, if provided the cell state is clipped

by this value prior to the cell output activation.

initializer: (optional) The initializer to use for the weight and: projection matrices.
num_proj: (optional) int, The output dimensionality for the projection: matrices. If None, no projection is performed.
proj_clip: (optional) A float value. If num_proj > 0 and proj_clip is: provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip].
num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a: variable_scope partitioner instead.
num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a: variable_scope partitioner instead.
forget_bias: Biases of the forget gate are initialized by default to 1 in: order to reduce the scale of forgetting at the beginning of the training. Must set it manually to 0.0 when restoring from CudnnLSTM trained checkpoints.
state_is_tuple: If True, accepted and returned states are 2-tuples of the: c_state and m_state. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: tanh. It: could also be string that is within Keras activation function names.
reuse: (optional) Python boolean describing whether to reuse variables in: an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share: weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of: the first input). Required when build is called before call.
**kwargs: Dict, keyword named properties for common layer attributes, like: trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTM-trained checkpoints, use CudnnCompatibleLSTMCell instead.

property state_size[source]¶

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

property output_size[source]¶: Integer or TensorShape: size of outputs produced by this cell.

build(input_shape)[source]¶

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Args:

input_shape: Instance of TensorShape, or list of instances of: TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs, state)[source]¶

Run one step of LSTM.

Args:

inputs: input Tensor, must be 2-D, [batch, input_size]. state: if state_is_tuple is False, this must be a state Tensor, `2-D,

[batch, state_size]`. If state_is_tuple is True, this must be a tuple of state Tensors, both 2-D, with column sizes c_state and m_state.

Returns:

A tuple containing:

A 2-D, [batch, output_dim], Tensor representing the output of the LSTM after reading inputs when previous state was state. Here output_dim is:

num_proj if num_proj was set, num_units otherwise.
Tensor(s) representing the new state of LSTM after reading inputs when the previous state was state. Same type and shape(s) as state.

Raises:

ValueError: If input size cannot be inferred from inputs via: static shape inference.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

TwoDNativeLstmCell¶

class returnn.tf.native_op.TwoDNativeLstmCell(pooling, **kwargs)[source]¶

Native 2D LSTM.

Parameters:

n_hidden (int)
n_input_dim (int)
n_input_dim_parts (int|list[int])
input_is_sparse (bool)
step (int) – what direction and step to use

does_input_projection = True[source]¶

classmethod map_layer_inputs_to_op(X, V_h, V_v, W, i, previous_state=None, previous_output=None, iteration=None)[source]¶: Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor X: inputs: shape (timeT,timeS,batch,n_hidden*5) :param tf.Tensor V_h: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor V_v: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor W: :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor previous_state: :param tf.Tensor previous_output: :param tf.Tensor iteration: :rtype: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

VanillaLSTMCell¶

class returnn.tf.layers.rec.VanillaLSTMCell(*args, **kwargs)[source]¶

Just a vanilla LSTM cell, which is compatible to our NativeLSTM (v1 and v2).

Parameters:: num_units (int)

property output_size[source]¶

Return type:: int

property state_size[source]¶

Return type:: rnn_cell.LSTMStateTuple

get_input_transformed(x, batch_dim=None)[source]¶

Parameters:

x (tf.Tensor) – (time, batch, dim), or (batch, dim)
batch_dim (tf.Tensor|None)

Returns:

like x, maybe other feature-dim

Return type:

tf.Tensor|tuple[tf.Tensor]

ZoneoutLSTMCell¶

class returnn.tf.layers.rec.ZoneoutLSTMCell(*args, **kwargs)[source]¶

Wrapper for tf LSTM to create Zoneout LSTM Cell. This code is an adapted version of Rayhane Mamas version of Tacotron-2

Refs:

https://github.com/Rayhane-mamah/Tacotron-2 https://arxiv.org/pdf/1606.01305.pdf

Initializer with possibility to set different zoneout values for cell/hidden states.

Parameters:

num_units – number of hidden units
zoneout_factor_cell – cell zoneout factor
zoneout_factor_output – output zoneout factor
use_zoneout_output – If False, return the direct output of the underlying LSTM, without applying zoneout. So the output is different from h. This is different from the original Zoneout LSTM paper. If True, h is the same as output, and it is the same as the original Zoneout LSTM paper. This was False in our earlier implementation, and up to behavior version 16. Since behavior version 17, the default is True.

property state_size[source]¶

Return type:: int

property output_size[source]¶

Return type:: int