NetworkRecurrentLayer
¶

class
NetworkRecurrentLayer.
Unit
(n_units, n_in, n_out, n_re, n_act)[source]¶ Abstract descriptor class for all kinds of recurrent units.
Parameters:  n_units – number of cells
 n_in – cell fan in
 n_out – cell fan out
 n_re – recurrent fan in
 n_act – number of outputs

scan
(x, z, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=1)[source]¶ Executes the iteration over the time axis (usually with theano.scan) :param step: python function to be executed :param x: unmapped input tensor in (time,batch,dim) shape :param z: same as x but already transformed to self.n_in :param non_sequences: see theano.scan :param i: index vector in (time, batch) shape :param outputs_info: see theano.scan :param W_re: recurrent weight matrix :param W_in: input weight matrix :param b: input bias :param go_backwards: whether to scan the sequence from 0 to T or from T to 0 :param truncate_gradient: see theano.scan :return:

scan_seg
(x, z, att, non_sequences, i, outputs_info, W_re, W_in, b, go_backwards=False, truncate_gradient=1)[source]¶ Executes the iteration over the time axis (usually with theano.scan) :param step: python function to be executed :param x: unmapped input tensor in (time,batch,dim) shape :param z: same as x but already transformed to self.n_in :param non_sequences: see theano.scan :param i: index vector in (time, batch) shape :param outputs_info: see theano.scan :param W_re: recurrent weight matrix :param W_in: input weight matrix :param b: input bias :param go_backwards: whether to scan the sequence from 0 to T or from T to 0 :param truncate_gradient: see theano.scan :return:

class
NetworkRecurrentLayer.
VANILLA
(n_units, **kwargs)[source]¶ A simple tanh unit

step
(i_t, x_t, z_t, z_p, h_p)[source]¶ performs one iteration of the recursion :param i_t: index at time step t :param x_t: raw input at time step t :param z_t: mapped input at time step t :param z_p: previous input from time step t1 :param h_p: previous hidden activation from time step t1 :return:


class
NetworkRecurrentLayer.
LEAKYLSTM
(n_units, **kwargs)[source]¶ A 1D cell proposed in http://jmlr.org/papers/volume17/14203/14203.pdf The simplified equations can be seen in Table 7, page 36. Type A with gamma_3==0. This cell has 3 units instead of 4 like LSTM

class
NetworkRecurrentLayer.
LEAKYLPLSTM
(n_units, **kwargs)[source]¶ A 1D cell proposed in http://jmlr.org/papers/volume17/14203/14203.pdf The simplified equations can be seen in Table 7, page 36. Type A. This cell has 4 units like the LSTM

class
NetworkRecurrentLayer.
PIDLSTM
(n_units, **kwargs)[source]¶ A 1D cell proposed in http://jmlr.org/papers/volume17/14203/14203.pdf The simplified equations can be seen in Table 7, page 36. Type E. This cell works as a dynamic PID filter of the input. The forget gate determines if it has PD od PI characteristic, the Proportional gate gates the P/I part, the Difference gate the D/P part. It can have advantages if there is no subsampling in the layer. This cell has 4 units like the LSTM

class
NetworkRecurrentLayer.
LSTMPS
(n_units, **kwargs)[source]¶ Very fast custom LSTM implementation for segment encoding

class
NetworkRecurrentLayer.
LSTMC
(n_units, **kwargs)[source]¶ The same implementation as above, but it executes a theano function (recurrent transform) in each iteration. This allows for additional dependencies in the recursion of the LSTM.

class
NetworkRecurrentLayer.
LSTMR
(n_units, **kwargs)[source]¶ Same as LSTMC but without recurrent matrix multiplication

class
NetworkRecurrentLayer.
GRU
(n_units, **kwargs)[source]¶ Gated recurrent unit as described in http://arxiv.org/abs/1502.02367

class
NetworkRecurrentLayer.
SRU
(n_units, **kwargs)[source]¶ Same as GRU but without reset weights, which allows for a faster computation on GPUs

class
NetworkRecurrentLayer.
RecurrentUnitLayer
(n_out=None, n_units=None, direction=1, truncation=1, sampling=1, encoder=None, unit='lstm', n_dec=0, attention='none', recurrent_transform='none', recurrent_transform_attribs='{}', attention_template=128, attention_distance='l2', attention_step='linear', attention_beam=0, attention_norm='exp', attention_momentum='none', attention_sharpening=1.0, attention_nbest=0, attention_store=False, attention_smooth=False, attention_glimpse=1, attention_filters=1, attention_accumulator='sum', attention_loss=0, attention_bn=0, attention_lm='none', attention_ndec=1, attention_memory=0, attention_alnpts=0, attention_epoch=1, attention_segstep=0.01, attention_offset=0.95, attention_method='epoch', attention_scale=10, base=None, aligner=None, lm=False, force_lm=False, droplm=1.0, forward_weights_init=None, bias_random_init_forget_shift=0.0, copy_weights_from_base=False, segment_input=False, join_states=False, sample_segment=None, **kwargs)[source]¶ Layer class to execute recurrent units
Parameters:  n_out – number of cells
 n_units – used when initialized via Network.from_hdf_model_topology
 direction – process sequence in forward (1) or backward (1) direction
 truncation – gradient truncation
 sampling – scan every nth frame only
 encoder – list of encoder layers used as initalization for the hidden state
 unit – cell type (one of ‘lstm’, ‘vanilla’, ‘gru’, ‘sru’)
 n_dec – absolute number of steps to unfold the network if integer, else relative number of steps from encoder
 recurrent_transform – name of recurrent transform
 recurrent_transform_attribs – dictionary containing parameters for a recurrent transform
 attention_template –
 attention_distance –
 attention_step –
 attention_beam –
 attention_norm –
 attention_sharpening –
 attention_nbest –
 attention_store –
 attention_align –
 attention_glimpse –
 attention_lm –
 base – list of layers which outputs are considered as based during attention mechanisms
 lm – activate RNNLM
 force_lm – expect previous labels to be given during testing
 droplm – probability to take the expected output as predecessor instead of the real one when LM=true
 bias_random_init_forget_shift – initialize forget gate bias of lstm networks with this value