TFNativeOp

TF implementation of NativeOp. Wrappers for most relevant NativeOp ops.

class TFNativeOp.OpDescription(in_info, out_info, c_fw_code, c_bw_code=None, c_extra_support_code=None, code_version=None, cpu_support=True, grad_input_map=None, name=None)[source]

Meta-info about an op, used by OpMaker.

Parameters:
  • in_info (list[dict(str)]) –

    each dict describes one input var. attribs in the dict:

    int ndim: the ndim. tuple shape: tuple and can contain None for specific dimensions.
    optional attribs:
    str dtype: “float32” by default. bool need_contiguous: false by default. int want_inplace: -1 by default. try to optimize to destroy input, on output-index.
    ”dummy_out” is a special value which will add another output.

    bool is_inplace: false by default. whether the optimization was applied. str gradient: can be “disconnected”. see grad(). bool bw_input: True by default. add this param to the bw input.

    other attribs are just ignored.

  • out_info (list[dict(str)]) –

    like in_info. slightly different behavior for:

    shape: we also allow refs to the in_info in the form (in-idx,dim). see infer_shape(). need_contiguous/want_inplace: used for bw, in case for bw_input == True.
  • c_fw_code (str) – C code for forward pass
  • c_extra_support_code (str|dict[str]) – C support code (for c_support_code)
  • c_bw_code (str|None) – C code for backward pass (for gradient)
  • code_version (tuple[int]) – will be returned by c_code_cache_version.
  • cpu_support (bool) –
  • grad_input_map (tuple[int]|callable) – selection of grad inputs. by default, we get all inputs + all outputs + all grad outputs.
  • name (str) – name
classmethod from_gen_base(gen_base)[source]
Parameters:gen_base (NativeOp.NativeOpGenBase|Type[NativeOp.NativeOpGenBase]) –
Return type:OpDescription
is_grad_defined[source]
Return type:bool
grad(self)[source]
Return type:OpDescription|None
class TFNativeOp.OpMaker(description, compiler_opts=None, search_for_runtime_blas=True, search_for_numpy_blas=True, search_for_system_blas=True, blas_lib=None)[source]

https://www.tensorflow.org/guide/extend/op

Parameters:
  • description (OpDescription) –
  • compiler_opts (dict[str]|None) – passed on to OpCodeCompiler as kwargs
with_cuda = None[source]
tf_blas_gemm_workaround = False[source]
global_lock = <unlocked _thread.RLock object owner=0 count=0>[source]
mod_cache = {}[source]
op_cache = {}[source]
classmethod cuda_blas_gemm_so_filename()[source]
Return type:str
op_name[source]
Return type:str
cache_key[source]
Return type:str
support_native_op_cpp_filename[source]
Return type:str
make_op(self, grad_func=None)[source]
Parameters:grad_func (None|(tf.Operation,*tf.Tensor)->tf.Tensor) –
Returns:op
TFNativeOp.load_dump_file(filename)[source]

See dump_to_file() in NativeOp.cpp.

Parameters:filename (str) –
Return type:numpy.ndarray
TFNativeOp.make_op(cls, **kwargs)[source]
Parameters:
Returns:

op

Return type:

(tf.Tensor) -> tuple[tf.Tensor]

TFNativeOp.make_lstm_op(**kwargs)[source]

See NativeLstmCell for usage.

Returns:op
Return type:(tf.Tensor) -> tuple[tf.Tensor]
class TFNativeOp.RecSeqCellOp(n_hidden, n_input_dim=None, n_input_dim_parts=None, input_is_sparse=False, step=None)[source]

In TF terminology, this is a “fused” cell, i.e. the op loops over the time. Similar is e.g. tf.contrib.rnnLSTMBlockFusedCell.

Parameters:
  • n_hidden (int) –
  • n_input_dim (int) –
  • n_input_dim_parts (int|list[int]) –
  • input_is_sparse (bool) –
  • step (int) – what direction and step to use
does_input_projection = False[source]
does_direction_handling = False[source]
state_size[source]
Return type:int|tuple[int]
class TFNativeOp.NativeLstmCell(**kwargs)[source]

Native LSTM.

classmethod map_layer_inputs_to_op(z, rec_weights, i, initial_state=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().

Parameters:
  • z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
  • rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
  • i (tf.Tensor) – index: shape (time,batch)
  • initial_state (tf.Tensor|None) – shape (batch,n_hidden)
Return type:

(tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

class TFNativeOp.NativeLstmLowMemCell(**kwargs)[source]

Native LSTM, low mem variant.

does_input_projection = True[source]
does_direction_handling = True[source]
map_layer_inputs_to_op(self, x, weights, b, i, initial_state=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor x: inputs: shape (time,batch,n_input_dim) :param tf.Tensor weights: shape (n_input_dim+n_hidden,n_hidden*4) :param tf.Tensor b: shape (n_hidden*4,) :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor|None initial_state: shape (batch,n_hidden) :rtype: tuple[tf.Tensor]

class TFNativeOp.NativeLstm2(rec_weight_dropout=0.0, **kwargs)[source]

Native LSTM 2. See NativeOp.NativeLstm2.

Parameters:rec_weight_dropout (float) – weight dropout in the recurrent matrix, https://openreview.net/pdf?id=SyyGPP0TZ
does_input_projection = False[source]
does_direction_handling = True[source]
state_size[source]
Return type:int|tuple[int]
class TFNativeOp.TwoDNativeLstmCell(pooling, **kwargs)[source]

Native 2D LSTM.

does_input_projection = True[source]
classmethod map_layer_inputs_to_op(X, V_h, V_v, W, i, previous_state=None, previous_output=None, iteration=None)[source]

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor X: inputs: shape (timeT,timeS,batch,n_hidden*5) :param tf.Tensor V_h: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor V_v: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor i: index: shape (time,batch) :rtype: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

TFNativeOp.chunk(x, index, chunk_size, chunk_step)[source]
Parameters:
  • x (tf.Tensor) – (time,batch,dim)
  • index (tf.Tensor) –
  • chunk_size (int|tf.Tensor) –
  • chunk_step (int|tf.Tensor) –
Returns:

out, oindex. out is of shape (chunk_size, n_batch * n_chunks, n_dim), oindex of shape (chunk_size, n_batch * n_chunks).

Return type:

(tf.Tensor,tf.Tensor)

TFNativeOp.unchunk(x, index, chunk_size, chunk_step, n_time, n_batch)[source]
Parameters:
  • x (tf.Tensor) – output e.g. from chunk()
  • index (tf.Tensor) –
  • chunk_size (int|tf.Tensor) –
  • chunk_step (int|tf.Tensor) –
  • n_time (tf.Tensor) –
  • n_batch (tf.Tensor) –
Returns:

out, oindex, ofactors

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

TFNativeOp.make_fast_baum_welch_op(**kwargs)[source]
Returns:op
Return type:(tf.Tensor) -> tuple[tf.Tensor]
TFNativeOp.fast_baum_welch(am_scores, edges, weights, start_end_states, float_idx, state_buffer=None)[source]
Parameters:
  • am_scores (tf.Tensor) – (time, batch, dim), in -log space
  • edges (tf.Tensor) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
  • weights (tf.Tensor) – (num_edges,), weights of the edges
  • start_end_states (tf.Tensor) – (2, batch), (start,end) state idx in automaton. there is only one single automaton.
  • float_idx (tf.Tensor) – (time, batch) -> 0 or 1 (index mask, via seq lens)
  • state_buffer (tf.Tensor) – (2, num_states)
Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

TFNativeOp.fast_baum_welch_by_sprint_automata(am_scores, float_idx, tags, sprint_opts, tdp_scale=1.0)[source]
Parameters:
  • am_scores (tf.Tensor) – (time, batch, dim), in -log space
  • float_idx (tf.Tensor) – (time, batch) -> 0 or 1 (index mask, via seq lens)
  • tags (tf.Tensor) – (batch,) -> seq name (str)
  • tdp_scale (float) – weights are multiplied by this
  • sprint_opts (dict[str]) –
Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

TFNativeOp.tf_fast_bw_fsa_staircase(seq_lens, **opts)[source]
Parameters:
Returns:

edges, weights, start_end_states

Return type:

(tf.Tensor, tf.Tensor, tf.Tensor)

TFNativeOp.get_ctc_fsa_fast_bw(targets, seq_lens, blank_idx)[source]

See NativeOp.GetCtcFsaFastBwOp. Generates a FSA with CTC topology. The output format is compatible to fast_baum_welch().

Parameters:
  • targets (tf.Tensor) – shape (batch,time), int32
  • seq_lens (tf.Tensor) – shape (batch), int32
  • blank_idx (int) –
Returns:

edges, weights, start_end_states; edges is (4,num_edges), int32, edges of the graph (from,to,emission_idx,sequence_idx). weights is (num_edges,), float32. all zero. start_end_states is (2,batch), int32, (start,end) state idx in FSA.

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

TFNativeOp.fast_baum_welch_staircase(am_scores, seq_lens, **opts)[source]
Parameters:
  • am_scores (tf.Tensor) – (time, batch, dim), in -log space
  • seq_lens (tf.Tensor) – (batch,) -> values in [1, …, dim-1]
  • opts – passed to Fsa.fast_bw_fsa_staircase()
Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

TFNativeOp.ctc_loss(logits, logits_seq_lens, logits_time_major, targets, targets_seq_lens)[source]

Similar to tf.nn.ctc_loss(). We use our fast_baum_welch(). Also see FastBaumWelchLoss.

Parameters:
  • logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim). unnormalized (before softmax)
  • logits_seq_lens (tf.Tensor) – shape (batch,) of int32|int64
  • logits_time_major (bool) –
  • targets (tf.Tensor) – batch-major, [batch,time]
  • targets_seq_lens (tf.Tensor) – (batch,)
Returns:

loss, shape (batch,)

Return type:

tf.Tensor

TFNativeOp.fast_viterbi(am_scores, am_seq_len, edges, weights, start_end_states)[source]
Parameters:
  • am_scores (tf.Tensor) – (time, batch, dim), in +log space (unlike fast_baum_welch)
  • am_seq_len (tf.Tensor) – (batch,), int32
  • edges (tf.Tensor) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
  • weights (tf.Tensor) – (num_edges,), weights of the edges
  • start_end_states (tf.Tensor) – (2, batch), (start,end) state idx in automaton. there is only one single automaton.
Returns:

(alignment, scores), alignment is (time, batch), scores is (batch,), in +log space

Return type:

(tf.Tensor, tf.Tensor)

TFNativeOp.ctc_loss_viterbi(logits, logits_seq_lens, logits_time_major, targets, targets_seq_lens)[source]

Similar to ctc_loss(). However, instead of using the full sum, we use the best path (i.e. Viterbi instead of Baum-Welch). We use our fast_viterbi().

Parameters:
  • logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim). unnormalized (before softmax)
  • logits_seq_lens (tf.Tensor) – shape (batch,) of int32|int64
  • logits_time_major (bool) –
  • targets (tf.Tensor) – batch-major, [batch,time]
  • targets_seq_lens (tf.Tensor) – (batch,)
Returns:

loss, shape (batch,)

Return type:

tf.Tensor

TFNativeOp.edit_distance(a, a_len, b, b_len)[source]

Wraps NativeOp.EditDistanceOp.

Parameters:
  • a (tf.Tensor) – (batch,time1), int32
  • a_len (tf.Tensor) – (batch,), int32
  • b (tf.Tensor) – (batch,time2), int32
  • b_len (tf.Tensor) – (batch,), int32
Returns:

(batch,) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

TFNativeOp.optimal_completion_edit_distance(a, a_len, b, b_len)[source]

Wraps NativeOp.OptimalCompletionEditDistanceOp.

Parameters:
  • a (tf.Tensor) – (batch,time1), int32. prefix
  • a_len (tf.Tensor) – (batch,), int32
  • b (tf.Tensor) – (batch,time2), int32
  • b_len (tf.Tensor) – (batch,), int32
Returns:

(batch,) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

TFNativeOp.optimal_completion_edit_distance_per_successor(a, a_len, b, b_len, successors)[source]

Wraps NativeOp.OptimalCompletionEditDistancePerSuccessorOp.

Parameters:
  • a (tf.Tensor) – (batch,time1), int32. prefix
  • a_len (tf.Tensor) – (batch,), int32
  • b (tf.Tensor) – (batch,time2), int32
  • b_len (tf.Tensor) – (batch,), int32
  • successors (tf.Tensor|int) – (n_labels,), int32. scalar means tf.range(successors)
Returns:

(batch,n_labels) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

TFNativeOp.next_edit_distance_row(last_row, a, a_n, a_ended, b, b_len)[source]

Wraps NativeOp.NextEditDistanceRowOp.

Parameters:
  • last_row (tf.Tensor) – 2d (batch,b_time + 1), int32. last edit distances
  • a (tf.Tensor) – symbols. 1d (batch,), int32. current.
  • a_n (tf.Tensor) – scalar, int32. current position
  • a_ended (tf.Tensor) – 1d (batch,), int32 (casted from bool, because int32 easier to handle)
  • b (tf.Tensor) – symbols. 2d (batch,b_time), int32
  • b_len (tf.Tensor) – 1d (batch,), int32
Returns:

2d (batch,b_time + 1), int32, next (unnormalized) edit distance row

Return type:

tf.Tensor

TFNativeOp.edit_distance_via_next_edit_distance_row(a, a_len, b, b_len, optimal_completion=False, full_row_output=False)[source]

This is mostly for demonstration and debugging. Should be equivalent to edit_distance() or optimal_completion_edit_distance() (which should be much faster).

Parameters:
  • a (tf.Tensor) – (batch,time1), int32
  • a_len (tf.Tensor) – (batch,), int32
  • b (tf.Tensor) – (batch,time2), int32
  • b_len (tf.Tensor) – (batch,), int32
  • optimal_completion (bool) – calc optimal completion edit distance instead
  • full_row_output (bool) – outputs the full final row
Returns:

(batch,) or (batch,time2+1) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

TFNativeOp.next_edit_distance_reduce(last_row, a, a_n, a_ended, b, b_len, optimal_completion=False)[source]

Wraps NativeOp.NextEditDistanceReduceOp.

Parameters:
  • last_row (tf.Tensor) – 2d (batch,b_time + 1), int32. last edit distances
  • a (tf.Tensor) – symbols. 2d (batch|1,n_labels), int32. current.
  • a_n (tf.Tensor) – scalar, int32. current position
  • a_ended (tf.Tensor) – 1d (batch,), int32 (casted from bool, because int32 easier to handle)
  • b (tf.Tensor) – symbols. 2d (batch,b_time), int32
  • b_len (tf.Tensor) – 1d (batch,), int32
  • optimal_completion (bool|tf.Tensor) –
Returns:

2d (batch,n_labels), int32, next (unnormalized) (optimal completion) edit distance

Return type:

tf.Tensor

TFNativeOp.optimal_completion_edit_distance_per_successor_via_next_edit_distance(a, a_len, b, b_len, successors)[source]

Uses next_edit_distance_reduce() and edit_distance_via_next_edit_distance_row(). Mostly for demonstration/testing. In practice, you would do something similar, but in your own loop. Similar to optimal_completion_edit_distance_per_successor(), but the handling of ended sequences (from a) is different.

Parameters:
  • a (tf.Tensor) – (batch,time1), int32. prefix
  • a_len (tf.Tensor) – (batch,), int32
  • b (tf.Tensor) – (batch,time2), int32
  • b_len (tf.Tensor) – (batch,), int32
  • successors (tf.Tensor|int) – (n_labels,), int32. scalar means tf.range(successors)
Returns:

(batch,n_labels) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

TFNativeOp.have_blocksparse_requirements()[source]
Returns:whether we can use the OpenAI blocksparse module
Return type:bool
TFNativeOp.init_blocksparse(with_native_module=True)[source]
Parameters:with_native_module (bool) –
TFNativeOp.demo()[source]

Simple demo for testing the compilation.