TFNetworkNeuralTransducer

class TFNetworkNeuralTransducer.NeuralTransducerLayer(transducer_hidden_units, n_out, transducer_max_width, input_block_size, embedding_size, e_symbol_index, use_prev_state_as_start=False, **kwargs)[source]

Creates a neural transducer based on the paper “A Neural Transducer”: https://arxiv.org/abs/1511.04868. NOTE: Requires that the loss be neural_transducer_loss. NOTE: When training with BiLSTM as input, set an appropriate gradient clipping parameter.

Initialize the Neural Transducer. :param int transducer_hidden_units: Amount of units the transducer should have. :param int n_out: The size of the output layer, i.e. the size of the vocabulary including <E> symbol. :param int transducer_max_width: The max amount of outputs in one NT block (including the final <E> symbol) :param int input_block_size: Amount of inputs to use for each NT block. :param int embedding_size: Embedding dimension size. :param int e_symbol_index: Index of e symbol that is used in the NT block. 0 <= e_symbol_index < num_outputs :param bool use_prev_state_as_start: Whether to use the last state of the previous recurrent layer as the ] initial state of the transducer. NOTE: For this to work, you have to watch out for: previous_layer.hidden_units = previous_layer.n_out = transducer.transducer_hidden_units

layer_class = 'neural_transducer'[source]
build_full_transducer(self, transducer_hidden_units, embeddings, num_outputs, input_block_size, transducer_max_width, encoder_outputs, trans_hidden_init)[source]

Builds the complete transducer. :param int transducer_hidden_units: Amount of units the transducer should have. :param tf.Variable embeddings: Variable with the reference to the embeddings. :param int num_outputs: The size of the output layer, i.e. the size of the vocabulary including <E> symbol. :param int input_block_size: Amount of inputs to use for each NT block. :param int transducer_max_width: The max amount of outputs in one NT block (including the final <E> symbol) :param tf.tensor encoder_outputs: The outputs of the encode in shape of [max_time, batch_size, encoder_hidden] :param tf.tensor trans_hidden_init: The init state of the transducer. Needs to be of shape [2, batch_size, transducer_hidden_units]. The trans_hidden_init[0] is the c vector of the lstm, trans_hidden_init[1] the hidden vector. :return: Returns a reference to the tf.tensor containing the logits. :rtype: tf.tensor

round_vector_to_closest_input_block(self, vector, input_block_size, transducer_max_width)[source]

Rounds up the provided vector so that every entry is a multiple of input_block_size. :param tf.tensor vector: A vector. :param int input_block_size: Input block size as specified in the __init__ function. :return: tf.tensor A vector the same shape as ‘vector’.

classmethod get_out_data_from_opts(n_out, **kwargs)[source]

Gets a Data template (i.e. shape etc is set but not the placeholder) for our __init__ args. The purpose of having this as a separate classmethod is to be able to infer the shape information without having to construct the layer. This function should not create any nodes in the computation graph.

Parameters:kwargs – all the same kwargs as for self.__init__()
Returns:Data template (placeholder not set)
Return type:Data
class TFNetworkNeuralTransducer.NeuralTransducerLoss(debug=False, max_variance=999999.9, **kwargs)[source]

The loss function that should be used with the NeuralTransducer layer. This loss function has the built in alignment algorithm from the original paper.

Initialize the Neural Transducer loss. :param bool debug: Whether to output debug info such as alignments, argmax, variance etc… :param float max_variance: If a time step (in CE) has a too high variance in within the batch, then the gradient for that time step will be ignored. Set this value lower if you have outliers that disrupt training.

class_name = 'neural_transducer'[source]
recurrent = True[source]
class Alignment(transducer_hidden_units, E_SYMBOL)[source]

Class to manage the alignment generation in the NT.

Alignment initiation. :param int transducer_hidden_units: Amount of hidden units that the transducer should have. :param int E_SYMBOL: The index of the <e> symbol.

insert_alignment(self, index, block_index, transducer_outputs, targets, transducer_amount_outputs, new_transducer_state)[source]

Inserts alignment properties for a new block. :param int index: The index of of y~ corresponding to the last target index. :param int block_index: The new block index. :param np.ndarray transducer_outputs: The computed transducer outputs. Shape [transducer_amount_outputs, 1, n_out] :param np.ndarray targets: The complete target array, should be of shape [total_target_length]. :param int transducer_amount_outputs: The amount of outputs that the transducer created in this block. :param np.ndarray new_transducer_state: The new transducer state of shape [2, 1, transducer_hidden_units]

init(self, **kwargs)[source]
Parameters:
  • output (Data) – generated output
  • output_with_activation (OutputWithActivation|None) –
  • target (Data) – reference target from dataset
  • layer (LayerBase|None) –
get_value(self)[source]
Returns:self.reduce_func(loss), which is usually a scalar with the default as if does tf.reduce_sum. float32 value. it should not be normalized over frames, as this will be calculated in TFEngine.Runner._collect_eval_info().
Return type:tf.Tensor|None
get_alignment_from_logits(self, logits, targets, amount_of_blocks, transducer_max_width)[source]

Finds the alignment of the target sequence to the actual output. :param logits: Logits from transducer, of size [transducer_max_width * amount_of_blocks, 1, vocab_size] :param targets: The target sequence of shape [time] where each entry is an index. :param amount_of_blocks: Amount of blocks in Neural Transducer. :param transducer_max_width: The max width of one transducer block. :return: Returns a list of indices where <e>’s need to be inserted into the target sequence, shape: [max_time, 1] (see paper) and a boolean mask for use with a loss function of shape [max_time, 1].

get_alignment_from_logits_manager(self, logits, targets, logit_lengths, targets_lengths)[source]

Get the modified targets & mask. :param logits: Logits of shape [max_time, batch_size, vocab_size] :param targets: Targets of shape [max_time, batch_size]. Each entry denotes the index of the correct target. :return: modified targets of shape [max_time, batch_size, vocab_size] & mask of shape [max_time, batch_size]

classmethod get_auto_output_layer_dim(target_dim)[source]
Parameters:target_dim (int) –
Returns:normally just the same as target_dim. e.g. for CTC, we would add 1 for the blank label
Return type:int
get_error(self)[source]
Returns:frame error rate as a scalar value with the default self.reduce_func (see also self.get_value)
Return type:tf.Tensor