`returnn.tf.native_op`¶

TF implementation of NativeOp. Wrappers for most relevant NativeOp ops.

class returnn.tf.native_op.OpDescription(in_info: Sequence[Dict[str, Any]], out_info: Sequence[Dict[str, Any]], c_fw_code: str, c_bw_code: str | None = None, c_extra_support_code: None | str | Dict[str, str] = None, code_version: Tuple[int, ...] | None = None, cpu_support: bool = True, grad_input_map: None | Tuple[int, ...] | Callable = None, name: str | None = None)[source]¶

Meta-info about an op, used by OpMaker.

Parameters:

in_info –
each dict describes one input var. attribs in the dict:

int ndim: the ndim. tuple shape: tuple and can contain None for specific dimensions.

optional attribs:
str dtype: “float32” by default. bool need_contiguous: false by default. int want_inplace: -1 by default. try to optimize to destroy input, on output-index.

”dummy_out” is a special value which will add another output.

bool is_inplace: false by default. whether the optimization was applied. str gradient: can be “disconnected”. see grad(). bool bw_input: True by default. add this param to the bw input.

other attribs are just ignored.
out_info –
like in_info. slightly different behavior for:

shape: we also allow refs to the in_info in the form (in-idx,dim). see infer_shape(). need_contiguous/want_inplace: used for bw, in case for bw_input == True.
c_fw_code – C code for forward pass
c_extra_support_code – C support code (for c_support_code)
c_bw_code – C code for backward pass (for gradient)
code_version – will be returned by c_code_cache_version.
cpu_support
grad_input_map – selection of grad inputs. by default, we get all inputs + all outputs + all grad outputs.
name – name

classmethod from_gen_base(gen_base)[source]¶

Parameters:: gen_base (returnn.native_op.NativeOpGenBase|type[returnn.native_op.NativeOpGenBase])
Return type:: OpDescription

property is_grad_defined[source]¶

Return type:: bool

grad()[source]¶

Return type:: OpDescription|None

class returnn.tf.native_op.OpMaker(description, compiler_opts=None, search_for_runtime_blas=True, search_for_numpy_blas=True, search_for_system_blas=True, blas_lib=None)[source]¶

https://www.tensorflow.org/guide/extend/op

Parameters:

description (OpDescription)
compiler_opts (dict[str]|None) – passed on to OpCodeCompiler as kwargs

with_cuda: bool | None = None[source]¶

tf_blas_gemm_workaround = False[source]¶

global_lock = <unlocked _thread.RLock object owner=0 count=0>[source]¶

mod_cache = {}[source]¶

op_cache = {}[source]¶

log_stream: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>[source]¶

classmethod cuda_blas_gemm_so_filename()[source]¶

Return type:: str

property op_name[source]¶

Return type:: str

property cache_key[source]¶

Return type:: str

property support_native_op_cpp_filename[source]¶

Return type:: str

make_op(grad_func=None)[source]¶

Parameters:: grad_func (None|(tf.Operation,*tf.Tensor)->tf.Tensor)
Returns:: op

returnn.tf.native_op.load_dump_file(filename)[source]¶

See dump_to_file() in NativeOp.cpp.

Parameters:: filename (str)
Return type:: numpy.ndarray

returnn.tf.native_op.make_op(cls, **kwargs)[source]¶

Parameters:

cls (type[returnn.native_op.NativeOpGenBase])
kwargs – passed to OpMaker

Returns:

Return type:

(tf.Tensor) -> tuple[tf.Tensor]

returnn.tf.native_op.make_lstm_op(**kwargs)[source]¶

See NativeLstmCell for usage.

Returns:: op
Return type:: (tf.Tensor) -> tuple[tf.Tensor]

class returnn.tf.native_op.RecSeqCellOp(n_hidden, n_input_dim=None, n_input_dim_parts=None, input_is_sparse=False, step=None)[source]¶

In TF terminology, this is a “fused” cell, i.e. the op loops over the time. Similar is e.g. tf.contrib.rnnLSTMBlockFusedCell.

Parameters:

n_hidden (int)
n_input_dim (int)
n_input_dim_parts (int|list[int])
input_is_sparse (bool)
step (int) – what direction and step to use

does_input_projection = False[source]¶

does_direction_handling = False[source]¶

property state_size[source]¶

Return type:: int|tuple[int]

class returnn.tf.native_op.NativeLstmCell(forget_bias=0.0, **kwargs)[source]¶

Native LSTM.

Parameters:: forget_bias (float)

classmethod map_layer_inputs_to_op(z, rec_weights, i, initial_state=None)[source]¶

Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().

Parameters:

z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
i (tf.Tensor) – index: shape (time,batch)
initial_state (tf.Tensor|None) – shape (batch,n_hidden)

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

class returnn.tf.native_op.NativeLstmLowMemCell(**kwargs)[source]¶

Native LSTM, low mem variant.

Parameters:

n_hidden (int)
n_input_dim (int)
n_input_dim_parts (int|list[int])
input_is_sparse (bool)
step (int) – what direction and step to use

does_input_projection = True[source]¶

does_direction_handling = True[source]¶

map_layer_inputs_to_op(x, weights, b, i, initial_state=None)[source]¶: Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor x: inputs: shape (time,batch,n_input_dim) :param tf.Tensor weights: shape (n_input_dim+n_hidden,n_hidden*4) :param tf.Tensor b: shape (n_hidden*4,) :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor|None initial_state: shape (batch,n_hidden) :rtype: tuple[tf.Tensor]

class returnn.tf.native_op.NativeLstm2(rec_weight_dropout=0.0, rec_weight_dropout_shape=None, forget_bias=0.0, **kwargs)[source]¶

Native LSTM 2. See NativeOp.NativeLstm2.

Parameters:

rec_weight_dropout (float) – weight dropout in the recurrent matrix, https://openreview.net/pdf?id=SyyGPP0TZ
rec_weight_dropout_shape (tuple[int|None]|None) – e.g. (None,1) to use dropout on all rec inputs (save memory)
forget_bias (float)

does_input_projection = False[source]¶

does_direction_handling = True[source]¶

property state_size[source]¶

Return type:: rnn_cell.LSTMStateTuple

class returnn.tf.native_op.TwoDNativeLstmCell(pooling, **kwargs)[source]¶

Native 2D LSTM.

Parameters:

n_hidden (int)
n_input_dim (int)
n_input_dim_parts (int|list[int])
input_is_sparse (bool)
step (int) – what direction and step to use

does_input_projection = True[source]¶

classmethod map_layer_inputs_to_op(X, V_h, V_v, W, i, previous_state=None, previous_output=None, iteration=None)[source]¶: Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor X: inputs: shape (timeT,timeS,batch,n_hidden*5) :param tf.Tensor V_h: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor V_v: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor W: :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor previous_state: :param tf.Tensor previous_output: :param tf.Tensor iteration: :rtype: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

returnn.tf.native_op.chunk(x, index, chunk_size, chunk_step)[source]¶

Parameters:

x (tf.Tensor) – (time,batch,dim)
index (tf.Tensor) – (time,batch)
chunk_size (int|tf.Tensor)
chunk_step (int|tf.Tensor)

Returns:

out, oindex. out is of shape (chunk_size, n_batch * n_chunks, n_dim), oindex of shape (chunk_size, n_batch * n_chunks).

Return type:

(tf.Tensor,tf.Tensor)

returnn.tf.native_op.unchunk(x, index, chunk_size, chunk_step, n_time, n_batch)[source]¶

Parameters:

x (tf.Tensor) – output e.g. from chunk()
index (tf.Tensor)
chunk_size (int|tf.Tensor)
chunk_step (int|tf.Tensor)
n_time (tf.Tensor)
n_batch (tf.Tensor)

Returns:

out, oindex, ofactors

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

returnn.tf.native_op.make_fast_baum_welch_op(**kwargs)[source]¶

Returns:: op
Return type:: (tf.Tensor) -> tuple[tf.Tensor]

returnn.tf.native_op.fast_baum_welch(am_scores, edges, weights, start_end_states, float_idx, state_buffer=None)[source]¶

Parameters:

am_scores (tf.Tensor) – (time, batch, dim), in -log space
edges (tf.Tensor) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
weights (tf.Tensor) – (num_edges,), weights of the edges
start_end_states (tf.Tensor) – (2, batch), (start,end) state idx in automaton. there is only one single automaton.
float_idx (tf.Tensor) – (time, batch) -> 0 or 1 (index mask, via seq lens)
state_buffer (tf.Tensor) – (2, num_states)

Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.native_op.fast_baum_welch_by_sprint_automata(am_scores, float_idx, tags, sprint_opts, tdp_scale=1.0)[source]¶

Parameters:

am_scores (tf.Tensor) – (time, batch, dim), in -log space
float_idx (tf.Tensor) – (time, batch) -> 0 or 1 (index mask, via seq lens)
tags (tf.Tensor) – (batch,) -> seq name (str)
tdp_scale (float) – weights are multiplied by this
sprint_opts (dict[str])

Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.native_op.tf_fast_bw_fsa_staircase(seq_lens, **opts)[source]¶

Parameters:

seq_lens (tf.Tensor) – shape (batch,)
opts – passed to Fsa.fast_bw_fsa_staircase()

Returns:

edges, weights, start_end_states

Return type:

(tf.Tensor, tf.Tensor, tf.Tensor)

returnn.tf.native_op.get_ctc_fsa_fast_bw(targets, seq_lens, blank_idx, label_loop=True)[source]¶

See NativeOp.GetCtcFsaFastBwOp. Generates a FSA with CTC topology. The output format is compatible to fast_baum_welch().

Parameters:

targets (tf.Tensor) – shape (batch,time), int32
seq_lens (tf.Tensor) – shape (batch), int32
blank_idx (int) – vocab index of the blank symbol
label_loop (bool) – True -> normal CTC; False -> RNA-like

Returns:

edges, weights, start_end_states; edges is (4,num_edges), int32, edges of the graph (from,to,emission_idx,sequence_idx). weights is (num_edges,), float32. all zero. start_end_states is (2,batch), int32, (start,end) state idx in FSA.

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

returnn.tf.native_op.fast_baum_welch_staircase(am_scores, seq_lens, **opts)[source]¶

Parameters:

am_scores (tf.Tensor) – (time, batch, dim), in -log space
seq_lens (tf.Tensor) – (batch,) -> values in [1, …, dim-1]
opts – passed to Fsa.fast_bw_fsa_staircase()

Returns:

(fwdbwd, obs_scores), fwdbwd is (time, batch, dim), obs_scores is (time, batch), in -log space

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.native_op.ctc_loss(*, logits, logits_seq_lens, logits_time_major, targets, targets_seq_lens, label_loop: bool | None = None, ctc_merge_repeated: bool | None = None, logits_normalize=True, grad_wrt_softmax_in=True, blank_index=-1)[source]¶

Similar to tf.nn.ctc_loss(). We use our fast_baum_welch(). Also see FastBaumWelchLoss.

Parameters:

logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim). unnormalized (before softmax)
logits_seq_lens (tf.Tensor) – shape (batch,) of int32|int64
logits_time_major (bool)
targets (tf.Tensor) – batch-major, [batch,time]
targets_seq_lens (tf.Tensor) – (batch,)
label_loop
ctc_merge_repeated – alias for label_loop
logits_normalize (bool) – apply log_softmax on logits (default). if False, you might also set grad_wrt_softmax_in=False
grad_wrt_softmax_in (bool) – assume p(s|x) = softmax(logits), and define the gradient w.r.t. logits. This is p(s|x) - bw, where bw is the Baum-Welch soft alignment. If logits are already normalized (e.g. we just use log p(s|x) = logits), the error signal to logits should be -bw.
blank_index (int) – vocab index of the blank symbol

Returns:

loss, shape (batch,)

Return type:

tf.Tensor

returnn.tf.native_op.fast_viterbi(*, am_scores, am_seq_len, edges, weights, start_end_states, mask_idx: int = 0)[source]¶

Parameters:

am_scores (tf.Tensor) – (time, batch, dim), in +log space (unlike fast_baum_welch)
am_seq_len (tf.Tensor) – (batch,), int32
edges (tf.Tensor) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
weights (tf.Tensor) – (num_edges,), weights of the edges
start_end_states (tf.Tensor) – (2, batch), (start,end) state idx in automaton. there is only one single automaton.
mask_idx – vocab index used for masking (e.g. padding, or if not path was found)

Returns:

(alignment, scores), alignment is (time, batch), scores is (batch,), in +log space

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.native_op.ctc_loss_viterbi(*, logits, logits_seq_lens, logits_time_major, targets, targets_seq_lens, blank_index=-1, label_loop: bool = True)[source]¶

Similar to ctc_loss(). However, instead of using the full sum, we use the best path (i.e. Viterbi instead of Baum-Welch). We use our fast_viterbi().

Parameters:

logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim). unnormalized (before softmax)
logits_seq_lens (tf.Tensor) – shape (batch,) of int32|int64
logits_time_major (bool)
targets (tf.Tensor) – batch-major, [batch,time]
targets_seq_lens (tf.Tensor) – (batch,)
blank_index (int) – vocab index of the blank symbol
label_loop

Returns:

loss, shape (batch,)

Return type:

tf.Tensor

returnn.tf.native_op.edit_distance(a, a_len, b, b_len)[source]¶

Wraps NativeOp.EditDistanceOp.

Parameters:

a (tf.Tensor) – (batch,time1), int32
a_len (tf.Tensor) – (batch,), int32
b (tf.Tensor) – (batch,time2), int32
b_len (tf.Tensor) – (batch,), int32

Returns:

(batch,) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

returnn.tf.native_op.optimal_completion_edit_distance(a, a_len, b, b_len)[source]¶

Wraps NativeOp.OptimalCompletionEditDistanceOp.

Parameters:

a (tf.Tensor) – (batch,time1), int32. prefix
a_len (tf.Tensor) – (batch,), int32
b (tf.Tensor) – (batch,time2), int32
b_len (tf.Tensor) – (batch,), int32

Returns:

(batch,) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

returnn.tf.native_op.optimal_completion_edit_distance_per_successor(a, a_len, b, b_len, successors)[source]¶

Wraps NativeOp.OptimalCompletionEditDistancePerSuccessorOp.

Parameters:

a (tf.Tensor) – (batch,time1), int32. prefix
a_len (tf.Tensor) – (batch,), int32
b (tf.Tensor) – (batch,time2), int32
b_len (tf.Tensor) – (batch,), int32
successors (tf.Tensor|int) – (n_labels,), int32. scalar means tf.range(successors)

Returns:

(batch,n_labels) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

returnn.tf.native_op.next_edit_distance_row(last_row, a, a_n, a_ended, b, b_len)[source]¶

Wraps NativeOp.NextEditDistanceRowOp.

Parameters:

last_row (tf.Tensor) – 2d (batch,b_time + 1), int32. last edit distances
a (tf.Tensor) – symbols. 1d (batch,), int32. current.
a_n (tf.Tensor) – scalar or 1d (batch,), int32. current position
a_ended (tf.Tensor) – 1d (batch,), int32 (casted from bool, because int32 easier to handle)
b (tf.Tensor) – symbols. 2d (batch,b_time), int32
b_len (tf.Tensor) – 1d (batch,), int32

Returns:

2d (batch,b_time + 1), int32, next (unnormalized) edit distance row

Return type:

tf.Tensor

returnn.tf.native_op.edit_distance_via_next_edit_distance_row(a, a_len, b, b_len, optimal_completion=False, full_row_output=False)[source]¶

This is mostly for demonstration and debugging. Should be equivalent to edit_distance() or optimal_completion_edit_distance() (which should be much faster).

Parameters:

a (tf.Tensor) – (batch,time1), int32
a_len (tf.Tensor) – (batch,), int32
b (tf.Tensor) – (batch,time2), int32
b_len (tf.Tensor) – (batch,), int32
optimal_completion (bool) – calc optimal completion edit distance instead
full_row_output (bool) – outputs the full final row

Returns:

(batch,) or (batch,time2+1) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

returnn.tf.native_op.next_edit_distance_reduce(last_row, a, a_n, a_ended, b, b_len, optimal_completion=False, a_blank_idx=None)[source]¶

Wraps NativeOp.NextEditDistanceReduceOp.

Parameters:

last_row (tf.Tensor) – 2d (batch,b_time + 1), int32. last edit distances
a (tf.Tensor) – symbols. 2d (batch|1,n_labels), int32. current.
a_n (tf.Tensor) – scalar or 1d (batch,), int32. current position
a_ended (tf.Tensor) – 1d (batch,), int32 (casted from bool, because int32 easier to handle)
b (tf.Tensor) – symbols. 2d (batch,b_time), int32
b_len (tf.Tensor) – 1d (batch,), int32
a_blank_idx (tf.Tensor|int|None) – scalar, int32
optimal_completion (bool|tf.Tensor)

Returns:

2d (batch,n_labels), int32, next (unnormalized) (optimal completion) edit distance

Return type:

tf.Tensor

returnn.tf.native_op.optimal_completion_edit_distance_per_successor_via_next_edit_distance(a, a_len, b, b_len, successors)[source]¶

Uses next_edit_distance_reduce() and edit_distance_via_next_edit_distance_row(). Mostly for demonstration/testing. In practice, you would do something similar, but in your own loop. Similar to optimal_completion_edit_distance_per_successor(), but the handling of ended sequences (from a) is different.

Parameters:

a (tf.Tensor) – (batch,time1), int32. prefix
a_len (tf.Tensor) – (batch,), int32
b (tf.Tensor) – (batch,time2), int32
b_len (tf.Tensor) – (batch,), int32
successors (tf.Tensor|int) – (n_labels,), int32. scalar means tf.range(successors)

Returns:

(batch,n_labels) tensor, int32, un-normalized edit distance

Return type:

tf.Tensor

returnn.tf.native_op.have_blocksparse_requirements()[source]¶

Returns:: whether we can use the OpenAI blocksparse module
Return type:: bool

returnn.tf.native_op.init_blocksparse(with_native_module=True)[source]¶

Parameters:: with_native_module (bool)

returnn.tf.native_op.demo()[source]¶: Simple demo for testing the compilation.

returnn.tf.native_op¶

`returnn.tf.native_op`¶