NetworkRecurrentLayer

class NetworkRecurrentLayer.Unit(n_units, n_in, n_out, n_re, n_act)[source]

Abstract descriptor class for all kinds of recurrent units.

Parameters:
  • n_units – number of cells
  • n_in – cell fan in
  • n_out – cell fan out
  • n_re – recurrent fan in
  • n_act – number of outputs
set_parent(parent)[source]
scan(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]

Executes the iteration over the time axis (usually with theano.scan) :param step: python function to be executed :param x: unmapped input tensor in (time,batch,dim) shape :param z: same as x but already transformed to self.n_in :param non_sequences: see theano.scan :param i: index vector in (time, batch) shape :param outputs_info: see theano.scan :param W_re: recurrent weight matrix :param W_in: input weight matrix :param b: input bias :param go_backwards: whether to scan the sequence from 0 to T or from T to 0 :param truncate_gradient: see theano.scan :return:

scan_seg(x, z, att, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]

Executes the iteration over the time axis (usually with theano.scan) :param step: python function to be executed :param x: unmapped input tensor in (time,batch,dim) shape :param z: same as x but already transformed to self.n_in :param non_sequences: see theano.scan :param i: index vector in (time, batch) shape :param outputs_info: see theano.scan :param W_re: recurrent weight matrix :param W_in: input weight matrix :param b: input bias :param go_backwards: whether to scan the sequence from 0 to T or from T to 0 :param truncate_gradient: see theano.scan :return:

class NetworkRecurrentLayer.VANILLA(n_units, **kwargs)[source]

A simple tanh unit

step(i_t, x_t, z_t, z_p, h_p)[source]

performs one iteration of the recursion :param i_t: index at time step t :param x_t: raw input at time step t :param z_t: mapped input at time step t :param z_p: previous input from time step t-1 :param h_p: previous hidden activation from time step t-1 :return:

class NetworkRecurrentLayer.LSTME(n_units, **kwargs)[source]

A theano based LSTM implementation

step(i_t, x_t, z_t, y_p, c_p, *other_args)[source]
class NetworkRecurrentLayer.LSTMS(n_units, **kwargs)[source]

A theano based LSTM implementation

step(i_t, x_t, z_t, att_p, y_p, c_p, *other_args)[source]
class NetworkRecurrentLayer.LEAKYLSTM(n_units, **kwargs)[source]

A 1D cell proposed in http://jmlr.org/papers/volume17/14-203/14-203.pdf The simplified equations can be seen in Table 7, page 36. Type A with gamma_3==0. This cell has 3 units instead of 4 like LSTM

step(i_t, x_t, z_t, y_p, c_p, *other_args)[source]
class NetworkRecurrentLayer.LEAKYLPLSTM(n_units, **kwargs)[source]

A 1D cell proposed in http://jmlr.org/papers/volume17/14-203/14-203.pdf The simplified equations can be seen in Table 7, page 36. Type A. This cell has 4 units like the LSTM

step(i_t, x_t, z_t, y_p, c_p, *other_args)[source]
class NetworkRecurrentLayer.PIDLSTM(n_units, **kwargs)[source]

A 1D cell proposed in http://jmlr.org/papers/volume17/14-203/14-203.pdf The simplified equations can be seen in Table 7, page 36. Type E. This cell works as a dynamic PID filter of the input. The forget gate determines if it has PD od PI characteristic, the Proportional gate gates the P/I part, the Difference gate the D/P part. It can have advantages if there is no subsampling in the layer. This cell has 4 units like the LSTM

step(i_t, x_t, z_t, y_p, c_p, *other_args)[source]
class NetworkRecurrentLayer.LSTMP(n_units, **kwargs)[source]

Very fast custom LSTM implementation

scan(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]
class NetworkRecurrentLayer.LSTMPS(n_units, **kwargs)[source]

Very fast custom LSTM implementation for segment encoding

scan_seg(x, z, non_sequences, i, att, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]
class NetworkRecurrentLayer.LSTMB(n_units, **kwargs)[source]

Very fast custom BLSTM implementation

scan(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]
NetworkRecurrentLayer.BLSTM[source]

alias of LSTMB

class NetworkRecurrentLayer.LSTMC(n_units, **kwargs)[source]

The same implementation as above, but it executes a theano function (recurrent transform) in each iteration. This allows for additional dependencies in the recursion of the LSTM.

scan(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]
class NetworkRecurrentLayer.LSTMR(n_units, **kwargs)[source]

Same as LSTMC but without recurrent matrix multiplication

scan(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=-1)[source]
class NetworkRecurrentLayer.GRU(n_units, **kwargs)[source]

Gated recurrent unit as described in http://arxiv.org/abs/1502.02367

step(i_t, x_t, z_t, z_p, h_p)[source]
class NetworkRecurrentLayer.SRU(n_units, **kwargs)[source]

Same as GRU but without reset weights, which allows for a faster computation on GPUs

step(i_t, x_t, z_t, z_p, h_p)[source]
class NetworkRecurrentLayer.RecurrentUnitLayer(n_out=None, n_units=None, direction=1, truncation=-1, sampling=1, encoder=None, unit='lstm', n_dec=0, attention='none', recurrent_transform='none', recurrent_transform_attribs='{}', attention_template=128, attention_distance='l2', attention_step='linear', attention_beam=0, attention_norm='exp', attention_momentum='none', attention_sharpening=1.0, attention_nbest=0, attention_store=False, attention_smooth=False, attention_glimpse=1, attention_filters=1, attention_accumulator='sum', attention_loss=0, attention_bn=0, attention_lm='none', attention_ndec=1, attention_memory=0, base=None, aligner=None, lm=False, force_lm=False, droplm=1.0, forward_weights_init=None, bias_random_init_forget_shift=0.0, copy_weights_from_base=False, segment_input=False, join_states=False, sample_segment=None, **kwargs)[source]

Layer class to execute recurrent units

Parameters:
  • n_out – number of cells
  • n_units – used when initialized via Network.from_hdf_model_topology
  • direction – process sequence in forward (1) or backward (-1) direction
  • truncation – gradient truncation
  • sampling – scan every nth frame only
  • encoder – list of encoder layers used as initalization for the hidden state
  • unit – cell type (one of ‘lstm’, ‘vanilla’, ‘gru’, ‘sru’)
  • n_dec – absolute number of steps to unfold the network if integer, else relative number of steps from encoder
  • recurrent_transform – name of recurrent transform
  • recurrent_transform_attribs – dictionary containing parameters for a recurrent transform
  • attention_template
  • attention_distance
  • attention_step
  • attention_beam
  • attention_norm
  • attention_sharpening
  • attention_nbest
  • attention_store
  • attention_align
  • attention_glimpse
  • attention_lm
  • base – list of layers which outputs are considered as based during attention mechanisms
  • lm – activate RNNLM
  • force_lm – expect previous labels to be given during testing
  • droplm – probability to take the expected output as predecessor instead of the real one when LM=true
  • bias_random_init_forget_shift – initialize forget gate bias of lstm networks with this value
recurrent = True[source]
layer_class = 'rec'[source]
cost()[source]
Return type:(theano.Variable | None, dict[theano.Variable,theano.Variable] | None)
Returns:cost, known_grads
create_seg_wise_encoder_output(att, aligner=None)[source]
class NetworkRecurrentLayer.RecurrentUpsampleLayer(factor, **kwargs)[source]
layer_class = 'recurrent_upsample'[source]
class NetworkRecurrentLayer.LinearRecurrentLayer(n_out, direction=1, **kwargs)[source]

Inspired from: http://arxiv.org/abs/1510.02693 Basically a very simple LSTM.

recurrent = True[source]
layer_class = 'linear_recurrent'[source]