returnn.tf.util.basic#

Lots of random utility functions for TensorFlow.

class returnn.tf.util.basic.CollectionKeys[source]#

Extension of tf.compat.v1.GraphKeys

RETURNN_LAYERS = '_RETURNN_layers'[source]#
RETURNN_NET_STACK = '_RETURNN_network_stack'[source]#
STATE_VARS = '_RETURNN_state_vars'[source]#
returnn.tf.util.basic.tf_version_tuple()[source]#
Returns:

version tuple, e.g. (1, 1, 0), parsed from tf.__version__

Return type:

tuple[int]

returnn.tf.util.basic.assert_min_tf_version(version, reason)[source]#
Parameters:
  • version (tuple[int]) – e.g. (1,2,0) or (1,2)

  • reason (str) –

returnn.tf.util.basic.have_min_tf_version(version)[source]#
Parameters:

version (tuple[int]) – e.g. (1,2,0) or (1,2)

Returns:

True if we have at least that version, or newer

Return type:

bool

class returnn.tf.util.basic.CustomUpdate[source]#

Custom updates will be handled by TFUpdater.

set_on_var(var)[source]#
Parameters:

var (tf.Variable) – variable to update. this will be recognized by TFUpdater.Updater

update_var(var)[source]#
Parameters:

var (tf.Variable) – variable to update

Returns:

operation which updates the variable, e.g. tf.compat.v1.assign_add(var, something)

Return type:

tf.Operation

class returnn.tf.util.basic.CustomUpdateExpAverage(average, alpha)[source]#

exponential moving average

Parameters:
  • average (tf.Tensor) –

  • alpha (float) –

update_var(var)[source]#
Parameters:

var (tf.Variable) –

Return type:

tf.Tensor

returnn.tf.util.basic.set_param_axes_split_info(param, axes_split_info)[source]#
Parameters:
  • param (tf.Variable|tf.Tensor) –

  • axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices

returnn.tf.util.basic.check_param_axes_split_info(param_shape, axes_split_info)[source]#
Parameters:
  • param_shape (list[int|None]|tuple[int|None]) –

  • axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices

returnn.tf.util.basic.get_param_axes_split_info(param)[source]#

See set_param_axes_split_info().

Parameters:

param (tf.Variable|tf.Tensor) –

Return type:

list[list[int]|None]|None

returnn.tf.util.basic.transform_param_axes_split_info_to_new_shape(axes_split_info, new_shape, debug_name='<unknown>')[source]#

new_shape can be bigger or smaller than the old shape. In some simple cases, it is obvious how that should be done, e.g. [[a],[b]*4], [a*2,b*8] -> [[a*2],[b*2]*4] In some, it is not so. E.g. [[a+b],[b]*4], [a+b*2,b*8] -> [[a+b*2],[b*2]*4].

We should try to always return something, though. If some case is not covered yet, extend this.

See test cases as well, test_transform_param_axes_split_info_to_new_shape(). No TF involved here, however, fits better to the functions above.

Parameters:
  • axes_split_info (list[list[int]]) –

  • new_shape (list[int]|tuple[int]) –

  • debug_name (str) –

Returns:

new axes-split-info for the new shape

Return type:

list[list[int]]

returnn.tf.util.basic.copy_with_new_split_axes(old_axis_splits, new_axis_splits, old_values: ndarray, new_values: ndarray | None = None)[source]#

On Numpy arrays only, however, fits better to the functions above.

Parameters:
  • old_axis_splits (list[list[int]]) –

  • new_axis_splits (list[list[int]]) –

  • old_values (numpy.ndarray) –

  • new_values (numpy.ndarray) –

Returns:

new values

Return type:

numpy.ndarray

returnn.tf.util.basic.get_padding_info_dict_ref(x)[source]#
Parameters:

x (tf.Tensor) –

Return type:

dict[Dim,float|int]

returnn.tf.util.basic.set_padding_info(x, dim, pad_value)[source]#

Stores the information what kind of padding value to expect after masking in the given dynamic dim.

Parameters:
  • x (tf.Tensor) –

  • dim (returnn.tensor.Dim) – dynamic seq len axis

  • pad_value (float|int) –

returnn.tf.util.basic.copy_compatible_reduce(source, target, reduce_type)[source]#

Extension of Data.copy_compatible_to which also reduces additional dims.

Parameters:
  • source (Data) –

  • target (Data) –

  • reduce_type (str) – eg “max”

Returns:

source with broadcast-compatible shape to target

Return type:

Data

class returnn.tf.util.basic.OutputWithActivation(x, act_func=None, act_func_opts=None)[source]#

Stores some tensor before and after some activation function, and also the activation function itself. (Maybe obsolete when you directly access the TF computation graph; but simpler.)

Parameters:
  • x (tf.Tensor) –

  • act_func (None|(tf.Tensor)->tf.Tensor) –

  • act_func_opts (None|dict[str]) –

is_softmax_act_func()[source]#
Return type:

bool

get_logits()[source]#
Return type:

tf.Tensor

Returns:

logits. logits are (not necessarily normalized) log probabilities, i.e. the input of softmax.

This call assumes that self.y is in probability space.

get_log_output()[source]#
Return type:

tf.Tensor

Returns:

tf.math.log(output)

returnn.tf.util.basic.variable_scalar_summaries_dict(x, name=None)[source]#

Collects all interesting information about x, such as min/max/mean, etc. (all scalars). This is used by variable_summaries().

Parameters:
  • x (tf.Tensor|tf.Variable) –

  • name (str) –

Returns:

dicth with key -> scalar info, e.g. with “%s_mean” % name -> tf.reduce_mean(x)

Return type:

dict[str,tf.Tensor]

returnn.tf.util.basic.variable_summaries(var, name=None, with_histogram=False)[source]#

Attach a lot of summaries to a Tensor (for TensorBoard visualization). Also see variable_scalar_summaries_dict().

Parameters:
  • var (tf.Tensor|tf.Variable) –

  • name (str) –

  • with_histogram (bool) – adds histogram. note that this can add noticeable overhead

Returns:

nothing, use tf.compat.v1.summary.merge_all() to collect the summaries

returnn.tf.util.basic.get_valid_scope_name_from_str(s)[source]#
Parameters:

s (str) – some name

Returns:

valid scope name, might be just s. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX

Return type:

str

returnn.tf.util.basic.get_current_var_scope_name()[source]#
Returns:

current absolute variable scope name, via tf.compat.v1.variable_scope

Return type:

str

returnn.tf.util.basic.get_current_name_scope()[source]#
Returns:

current absolute name scope, via tf.name_scope

Return type:

str

https://stackoverflow.com/questions/40907769/how-to-get-current-tensorflow-name-scope

Note that this is a private member and might break at some point. Note also that this does not need to be the same as get_current_var_scope_name().

returnn.tf.util.basic.reuse_name_scope(name, absolute=None, **kwargs)[source]#

Context manager to reuse an already created scope. We try to both set the variable scope and the name scope.

Parameters:
  • name (str|tf.compat.v1.VariableScope) – relative or absolute name scope (absolute if absolute=True or if tf.compat.v1.VariableScope). Must not end with “/”.

  • absolute (bool|None) – if True it will be absolute

  • kwargs – passed on to tf.compat.v1.variable_scope

Returns:

yields the variable_scope

returnn.tf.util.basic.opt_reuse_name_scope(name)[source]#
Parameters:

name (str|tf.compat.v1.VariableScope) –

Returns:

yields the variable_scope

returnn.tf.util.basic.get_name_scope_of_tensor(x)[source]#
Parameters:

x (tf.Tensor) – has name e.g. “layer0/rec/W:0”

Returns:

the name scope of x, e.g. “layer0/rec”

Return type:

str

returnn.tf.util.basic.get_base_name(x)[source]#
Parameters:

x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”

Returns:

return the base name, e.g. “W”, without the output index

returnn.tf.util.basic.reuse_name_scope_of_tensor(x, prefix='', postfix='', add_tensor_name=False)[source]#
Parameters:
  • x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”

  • prefix (str) –

  • postfix (str) –

  • add_tensor_name (bool) –

Returns:

reuse the name scope of x, e.g. “layer0/rec”, yields scope

returnn.tf.util.basic.default_control_flow_ctx()[source]#

This was earlier called var_creation_scope.

If you create a variable inside of a while-loop, you might get the following error:

InvalidArgumentError: The node ‘while/w/Assign’ has inputs from different frames. The input ‘while/j’ is in frame ‘while/while/’. The input ‘while/w’ is in frame ‘’.

This happens when you directly call tf.Variable, because the initial_value might be a tensor which depends on the current control flow context. See tests/test_TFUtil.py:test_loop_var_creation() for an example.

Related TF bugs:

One solution is to reset the current control flow context. See also same_control_flow_ctx().

However, with respect to variables, you should instead use tf.get_variable, which does not have this problem.

returnn.tf.util.basic.get_root_graph(graph=None)[source]#
Parameters:

graph (tf.Graph|None) –

Returns:

root graph. with control flow v2, the current graph might not be the root graph

Return type:

tf.Graph

returnn.tf.util.basic.flip_gradient(x, scale=1.0)[source]#
Parameters:
  • x (tf.Tensor) –

  • scale (float) –

Returns:

identity(x) but with flipped gradient (optionally scaled)

Return type:

tf.Tensor

returnn.tf.util.basic.lookup_grad_func_by_name(op_type)[source]#
Parameters:

op_type (str) –

Returns:

function grad_func(op, grad), or raises LookupError

returnn.tf.util.basic.opt_register_grad_func(op_type, grad_func, assert_is_same=True)[source]#
Parameters:
  • op_type (str) –

  • grad_func – function grad_func(op, grad)

  • assert_is_same (bool) –

returnn.tf.util.basic.identity_with_check_numerics(x, with_grad=True, name='identity_with_check_numerics')[source]#

Returns identity(x), but with additional check_numerics control dependency, and optionally the same for its gradient. See also TFUpdater.add_check_numerics_ops(), which will add checks for the whole graph.

Parameters:
  • x (tf.Tensor) –

  • with_grad (bool) – whether the check will also be added for the gradient

  • name (str) –

Return type:

tf.Tensor

returnn.tf.util.basic.check_input_ndim(x, ndim)[source]#
Parameters:
  • x (tf.Tensor) –

  • ndim (int) –

Returns:

x with check added

Return type:

tf.Tensor

returnn.tf.util.basic.check_input_ndim_equal_offset(x, y, y_ndim_offset=0)[source]#
Parameters:
  • x (tf.Tensor) –

  • y (tf.Tensor) –

  • y_ndim_offset (int) –

Returns:

x with check added such that ndim(x) == ndim(y) + y_ndim_offset

Return type:

tf.Tensor

returnn.tf.util.basic.check_input_dim(x, axis, dim)[source]#
Parameters:
  • x (tf.Tensor) –

  • axis (int) – which axis to check

  • dim (int|tf.Tensor) –

Returns:

x with check added

Return type:

tf.Tensor

returnn.tf.util.basic.check_dim_equal(x, x_axis, y, y_axis, extra_msg=())[source]#
Parameters:
  • x (tf.Tensor) –

  • x_axis (int) – which axis to check

  • y (tf.Tensor) –

  • y_axis (int) – which axis to check

  • extra_msg (Sequence[str|tf.Tensor]) – will be printed additionally if it fails

Returns:

x with check added that shape(x)[x_axis] == shape(y)[y_axis]

Return type:

tf.Tensor

returnn.tf.util.basic.check_shape_equal(x, y)[source]#
Parameters:
  • x (tf.Tensor) –

  • y (tf.Tensor) –

Returns:

x with check added that shape(x) == shape(y)

Return type:

tf.Tensor

returnn.tf.util.basic.get_shape_dim(x, axis, name='shape_dim')[source]#
Parameters:
  • x (tf.Tensor) –

  • axis (int) – which axis

  • name (str) –

Returns:

x.shape[axis] either as a static int or otherwise as an expression

Return type:

int|tf.Tensor

returnn.tf.util.basic.get_shape(x)[source]#
Parameters:

x (tf.Tensor|tf.Variable) –

Returns:

list of scalars, which are either int if known statically, or otherwise expressions

Return type:

list[int|tf.Tensor]

returnn.tf.util.basic.get_ndim(x)[source]#
Parameters:

x (tf.Tensor) –

Returns:

x.ndim either as a static int or otherwise as an expression

Return type:

int|tf.Tensor

returnn.tf.util.basic.get_range(start, stop=<class 'returnn.util.basic.NotSpecified'>)[source]#
Parameters:
  • start (int|tf.Tensor|None) –

  • stop (int|tf.Tensor|None) –

Returns:

either tuple(range(start, stop)) or the same as a symbolic expression

Return type:

tuple[int]|tf.Tensor

returnn.tf.util.basic.identity_with_ops(x, ops)[source]#
Parameters:
  • x (tf.Tensor) –

  • ops (() -> list[tf.Operation|tf.Tensor]) –

Returns:

x with all ops executed

Return type:

tf.Tensor

returnn.tf.util.basic.setup_tf_thread_pools(num_threads=None, log_file=None, tf_session_opts=None)[source]#

See here for documentation of intra_op_parallelism_threads and inter_op_parallelism_threads: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto

intra_op_parallelism_threads is used for the LocalDevice::EigenThreadPoolInfo, which is always global. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/local_device.cc

inter_op_parallelism_threads is used for the (global if not use_per_session_threads) session thread pool. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc

TF will setup the thread pools on first usage. That can happen quite early, esp for intra_op_parallelism_threads. E.g. list_local_devices() will trigger this, i.e. any call to is_gpu_available() or print_available_devices(). For debugging, you can set the env-var TF_CPP_MIN_VLOG_LEVEL=1 and then check for these message:

Local device intra op parallelism threads: 4
Direct session inter op parallelism threads: 4

Thus, call this function as early as possible with your preferred number of threads, used for both thread pools. It will create a dummy session and directly close it again, but if you use the global thread pools, those settings will remain for further sessions. This function will only execute on the first call.

Parameters:
  • num_threads (int) – used for both intra and inter parallelism thread pools

  • log_file (stream|None) –

  • tf_session_opts (dict[str]) –

returnn.tf.util.basic.check_initial_tf_thread_pool_init(tf_session_opts=None)[source]#

Makes sure that the TF thread pools are initialized with the requested settings. You probably want to call this very early.

Parameters:

tf_session_opts (dict[str]|None) –

returnn.tf.util.basic.get_tf_list_local_devices(tf_session_opts=None, file=None)[source]#

This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first. Note that this will list all available devices. Any TF session might only use a subset of these. You can get the list available in a given TF session by tf.compat.v1.Session.list_devices().

Parameters:
  • tf_session_opts (dict[str]|None) – if given, will init a temp tf.compat.v1.Session with these opts

  • file (TextIO|None) – log_stream stream for print statements, defaults to sys.stdout

Return type:

list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]

returnn.tf.util.basic.print_available_devices(tf_session_opts=None, file=None)[source]#

Prints the available TF devices on file (stdout by default). This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Parameters:
  • tf_session_opts (dict[str]|None) – if given, will init a temp Session with these opts

  • file (TextIO|None) – file stream for print statements, defaults to sys.stdout

returnn.tf.util.basic.is_gpu_available()[source]#

Returns whether TensorFlow can access a GPU. This uses tensorflow.device_lib.list_local_devices(), i.e. this is independent from the current TF session. If you want to know whether the current TF session has a GPU available, use is_gpu_available_in_session().

Note that this does not tell whether the GPU or TF supports CUDA. See is_tf_cuda_build() for that.

Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Return type:

bool

returnn.tf.util.basic.is_gpu_available_in_session(session=None)[source]#
Parameters:

session (tf.compat.v1.Session|None) – If None, will use current active/default session. If that is also not available (no current active session), we check a RETURNN global config, and return whether the RETURNN global config will use GPU or not. If the RETURNN global config is not available, we will use if a GPU is in general available for TF.

Returns:

whether the TensorFlow session has a GPU device. Also see is_gpu_available().

Return type:

bool

returnn.tf.util.basic.get_available_gpu_devices()[source]#

Returns a list of available GPU devices. This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Return type:

list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]

returnn.tf.util.basic.get_available_gpu_cuda_min_compute_capability()[source]#

Uses get_available_gpu_devices().

Returns:

e.g. 3.0, or 5.0, etc, or None

Return type:

float|None

returnn.tf.util.basic.is_tf_cuda_build()[source]#
Returns:

whether TF was build with CUDA support. also see is_gpu_available()

Return type:

bool

returnn.tf.util.basic.dot(a, b, transpose_b=False)[source]#
Parameters:
  • a (tf.Tensor) – shape […da…,d]

  • b (tf.Tensor) – shape [d,…db…] (or […db…,d] if transpose_b)

  • transpose_b (bool) –

Returns:

tensor of shape […da…,…db…]

Return type:

tf.Tensor

returnn.tf.util.basic.identity(x)[source]#
Parameters:

x (tf.Tensor) –

Return type:

tf.Tensor

returnn.tf.util.basic.get_activation_function(s)[source]#
Parameters:

s (str|None) –

Return type:

(tf.Tensor) -> tf.Tensor

returnn.tf.util.basic.gelu(x)[source]#

Gaussian Error Linear Units (GELUs) (https://arxiv.org/abs/1606.08415). Alternative to relu.

Parameters:

x (tf.Tensor) –

Return type:

tf.Tensor

returnn.tf.util.basic.gelu2(x)[source]#

Another approximation of the GELU (https://github.com/hendrycks/GELUs). Faster but less accurate than gelu (https://github.com/hendrycks/GELUs).

Parameters:

x (tf.Tensor) –

Return type:

tf.Tensor

returnn.tf.util.basic.gelu3(x)[source]#

Another version of the GELU, as it is used in PyTorch https://github.com/pytorch/pytorch/blob/b7f4b6a6de30116f1b44a08fab9499dd5eb2de7d/test/cpp/api/functional.cpp#L839-L845

Parameters:

x (tf.Tensor) –

Return type:

tf.Tensor

returnn.tf.util.basic.random_uniform_abs_initializer(limit, **kwargs)[source]#
Parameters:
  • limit (float|int|tf.Tensor) –

  • kwargs – passed to tf.random_uniform_initializer

Return type:

tensorflow.python.ops.init_ops.Initializer

returnn.tf.util.basic.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)[source]#

Alias for tf.glorot_uniform_initializer or tf.glorot_normal_initializer.

Parameters:
  • uniform (bool) – uniform or normal distribution

  • seed (int) –

  • dtype (tf.DType) –

Returns:

((tuple[int]) -> tf.Tensor) | tensorflow.python.ops.init_ops.Initializer

returnn.tf.util.basic.wrap_distribution_non_zero(x, zero_limit, limit)[source]#
Parameters:
  • x (tf.Tensor) – values in [-limit,limit]

  • zero_limit (float) –

  • limit (float) –

Returns:

same shape as x. rescale and shifts such that values from [-zero_limit,zero_limit] are excluded. still values are in [-limit,limit].

Return type:

tf.Tensor

class returnn.tf.util.basic.VarianceScalingNonZero(non_zero_fraction=0.5, **kwargs)[source]#

Same as tf.VarianceScaling, i.e. truncated normal or uniform from [-limit,limit] for some limit, except that we exclude the range [-limit*non_zero_fraction,limit*non_zero_fraction]. non_zero_fraction=0 would yield no difference.

For reference, to get the behavior of glorot_uniform, use these args:

mode=”fan_avg”, distribution=”uniform”

DEPRECATED FUNCTION ARGUMENT VALUES (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (dtype). They will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor

Deprecated: SOME ARGUMENT VALUES ARE DEPRECATED: (distribution=’normal’). They will be removed in a future version. Instructions for updating: normal is a deprecated alias for truncated_normal

returnn.tf.util.basic.variance_scaling_non_zero_initializer[source]#

alias of VarianceScalingNonZero

returnn.tf.util.basic.load_txt_file_initializer(filename, dtype=tf.float32)[source]#
Parameters:
  • filename (str) –

  • dtype (tf.DType) –

Returns:

function, when called, will return the content

Return type:

()->tf.Tensor

class returnn.tf.util.basic.GammatoneFilterbankInitializer(**kwargs)[source]#

Initializer for a gammatone filterbank, e.g., to initialize weights of a convolutional layer.

Parameters:

kwargs – kwargs for GammatoneFilterbank

returnn.tf.util.basic.get_initializer(s, seed=None, eval_local_ns=None, dtype=tf.float32)[source]#
Parameters:
  • s (str|dict[str]|float|numpy.ndarray) – e.g. “glorot_uniform” or “truncated_normal” or “orthogonal”, or config dict with “class”, or string to be `eval`ed if it contains “(”. constant if a float is given.

  • seed (int|tf.Tensor) – used in case the initializer has no explicit seed specified.

  • eval_local_ns (dict[str]|None) –

  • dtype (tf.DType|str) –

Returns:

(function (shape) -> tf.Tensor) | tf.Initializer

Return type:

((tuple[int]) -> tf.Tensor) | tf.Initializer

returnn.tf.util.basic.dropout(x, keep_prob, noise_shape=None, seed=None, name=None, cond_on_train=False, apply_correction_factor=True, grad_checkpointing=False)[source]#

Computes dropout. Like tf.nn.dropout() but avoid tf.div() if possible. If noise_shape is statically known, and x is inside a recurrent loop, we will reuse the same mask for all frames.

Parameters:
  • x (tf.Tensor) –

  • keep_prob (float|tf.Tensor) –

  • noise_shape (tf.Tensor|tuple[int|None]) – 1 will broadcast in that dimension, None will not broadcast

  • seed (int) –

  • name (str) –

  • cond_on_train (bool) – automatically wrap through cond_on_train_flag()

  • apply_correction_factor (bool) –

  • grad_checkpointing (bool) – use gradient checkpointing for the result

returnn.tf.util.basic.layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]#

Layer normalization. Also see openai_layer_norm(). Also see tensorflow.contrib.layers.layer_norm().

Parameters:
  • x (tf.Tensor) –

  • gain (tf.Tensor) –

  • bias (tf.Tensor) –

  • axis (int) –

  • epsilon (float) – OpenAI uses 1e-6, TF contrib uses 1e-12, pbhatia243 uses 1e-5.

Return type:

tf.Tensor

returnn.tf.util.basic.openai_layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]#

Layer normalization, like layer_norm(), but fast kernel by OpenAI (implemented as part of their blocksparse). To use it, init the git submodule in extern/blocksparse.

Parameters:
  • x (tf.Tensor) –

  • gain (tf.Tensor) –

  • bias (tf.Tensor) –

  • axis (int) –

  • epsilon (float) –

Return type:

tf.Tensor

returnn.tf.util.basic.swapaxes(x, axis1, axis2)[source]#

Also see move_axis() or dimshuffle().

Parameters:
  • x (tf.Tensor) –

  • axis1 (tf.Tensor|int) –

  • axis2 (tf.Tensor|int) –

Returns:

tensor with swapped axes, like numpy.swapaxes

Return type:

tf.Tensor

returnn.tf.util.basic.move_axis(x, old_axis, new_axis, name='move_axis')[source]#

Also see swapaxes() or dimshuffle().

Parameters:
  • x (tf.Tensor) –

  • old_axis (int) – can also be negative

  • new_axis (int) – can also be negative

  • name (str) – name of the scope

class returnn.tf.util.basic.TensorCachedComputation(x, key)[source]#

Helper to cache some computation inside a tf.Tensor object. Or also inside any other object.

Parameters:
  • x (tf.Tensor|object) –

  • key (str|tuple[str|int|tf.Tensor]) –

has_cache()[source]#
Returns:

whether we have stored the value already. if True, you can use get_cache()

Return type:

bool

get_cache()[source]#
Return type:

tf.Tensor

set_cache(value)[source]#
Parameters:

value (tf.Tensor) –

returnn.tf.util.basic.sequence_mask(lengths, name=None, **kwargs)[source]#

Wraps around tf.sequence_mask(). It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:
  • lengths (tf.Tensor) – shape (batch,)

  • name (str|None) –

  • kwargs – passed on to tf.sequence_mask

Returns:

tensor mask of shape (batch,maxlen/time). default dtype is bool unless you specify something else

Return type:

tf.Tensor

returnn.tf.util.basic.sequence_mask_time_major(lengths, **kwargs)[source]#

Wraps around tf.transpose(tf.sequence_mask(), (1,0)). It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:
  • lengths (tf.Tensor) – shape (batch,)

  • kwargs – passed on to tf.sequence_mask

Returns:

mask of shape (maxlen/time,batch)

Return type:

tf.Tensor

returnn.tf.util.basic.directed(x, direction)[source]#

If direction == 1 or direction is None, returns just x. If direction == -1, returns reversed(x).

Parameters:
  • x (tf.Tensor) –

  • direction (int|None) – -1 or 1 (or None)

Return type:

tf.Tensor

returnn.tf.util.basic.reversed(x)[source]#

Just returns x[::-1]. It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:

x (tf.Tensor) –

Return type:

tf.Tensor

returnn.tf.util.basic.get_flatten_with_seq_len_mask_cache_for_data(x)[source]#
Parameters:

x (Data) –

Return type:

TensorCachedComputation

returnn.tf.util.basic.get_flatten_with_seq_len_mask_cache(x, seq_lens, batch_dim_axis, time_dim_axis)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)

  • seq_lens (tf.Tensor) – shape (batch,) of int32

  • batch_dim_axis (int) – index of batch_dim in x

  • time_dim_axis (int) – index of time_dim in x

Return type:

TensorCachedComputation

returnn.tf.util.basic.flatten_with_seq_len_mask(x, seq_lens, batch_dim_axis=None, time_dim_axis=None, time_major=None)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)

  • seq_lens (tf.Tensor) – shape (batch,) of int32

  • batch_dim_axis (int) – index of batch_dim in x

  • time_dim_axis (int) – index of time_dim in x

  • time_major (bool) – whether time axis is 0 (redundant, kept for compatibility)

Returns:

tensor of shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time

Return type:

tf.Tensor

returnn.tf.util.basic.flatten_with_seq_len_mask_time_major(x, seq_lens, batch_dim_axis, time_dim_axis)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)

  • seq_lens (tf.Tensor) – shape (batch,) of int32

  • batch_dim_axis (int) – index of batch_dim in x

  • time_dim_axis (int) – index of time_dim in x

Returns:

tensor of shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time

Return type:

tf.Tensor

returnn.tf.util.basic.unflatten_with_seq_len_mask(x, seq_lens, batch_major=True)[source]#

Basically inverse of flatten_with_seq_len_mask() and flatten_with_seq_len_mask_time_major() :param tf.Tensor x: shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time :param tf.Tensor seq_lens: shape (batch,) of int32 :param bool batch_major: if True, the output will be batch major :return: tensor of shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…) :rtype: tf.Tensor

returnn.tf.util.basic.expand_dims_unbroadcast(x, axis, dim, name='expand_dims_unbroadcast')[source]#
Parameters:
  • x (tf.Tensor|float|int) –

  • axis (int|tf.Tensor) – new axis

  • dim (int|tf.Tensor) – dimension for axis

  • name (str) – scope name

Returns:

if x is of shape (a,b,c) and axis=0, then we return (dim,a,b,c)

Return type:

tf.Tensor

returnn.tf.util.basic.expand_multiple_dims(x, axes, name='expand_multiple_dims')[source]#
Parameters:
  • x (tf.Tensor) –

  • axes (list[int]|tuple[int]) – after completion, tf.shape(y)[axis] == 1 for axis in axes

  • name (str) – scope name

Returns:

y where we have a new broadcast axis for each axis in axes

Return type:

tf.Tensor

returnn.tf.util.basic.tile_transposed(x, axis, multiples)[source]#

Example: x with shape (D,), tf.tile(x, [N]) can be reshaped into (N,D), while tile_transposed(x, axis=0, multiples=N) can be reshaped into (D,N).

Parameters:
  • x (tf.Tensor) –

  • axis (int) –

  • multiples (int|tf.Tensor) –

Returns:

tensor with shape[axis] == x.shape[axis] * multiples

Return type:

tf.Tensor

returnn.tf.util.basic.constant_with_shape(x, shape, dtype=None, name='constant_with_shape')[source]#
Parameters:
  • x (tf.Tensor|float|int|bool) – scalar

  • shape (list[tf.Tensor|int]|tuple[tf.Tensor|int]|tf.Tensor) –

  • dtype (tf.DType) –

  • name (str) –

Returns:

x of the specified shape

Return type:

tf.Tensor

returnn.tf.util.basic.dimshuffle(x, axes, name='dimshuffle')[source]#

Like Theanos dimshuffle. Combines tf.transpose, tf.expand_dims and tf.squeeze.

Parameters:
  • x (tf.Tensor) –

  • axes (list[int|str]|tuple[int|str]) –

  • name (str) – scope name

Return type:

tf.Tensor

returnn.tf.util.basic.sparse_labels_with_seq_lens(x, seq_lens, dtype=tf.int32, collapse_repeated=False, post_filter_idx=None)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type

  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64

  • dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32

  • collapse_repeated (bool) – like uniq() behavior

  • post_filter_idx (int|list[int]|set[int]|None) – if given, after an optional collapse_repeated, will remove all those idx

Returns:

SparseTensor, e.g. input for tf.nn.ctc_loss(), and seq_lens of shape (batch,)

Return type:

(tf.SparseTensor, tf.Tensor)

returnn.tf.util.basic.sparse_labels(x, seq_lens, dtype=tf.int32, collapse_repeated=False)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type

  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64

  • dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32

  • collapse_repeated (bool) – like uniq() behavior

Returns:

SparseTensor, e.g. input for tf.nn.ctc_loss()

Return type:

tf.SparseTensor

returnn.tf.util.basic.uniq(x)[source]#
Parameters:

x (tf.Tensor) – 1D shape (time,) -> index, some int type

Returns:

like numpy.uniq. unlike tf.unique which will never repeat entries.

Example: uniq([0, 0, 1, 1, 0, 0]) == [0, 1, 0], tf.unique([0, 0, 1, 1, 0, 0]) == [0, 1]. For a batched variant, see batched_uniq, or sparse_labels() with option collapse_repeated.

returnn.tf.util.basic.batched_uniq(x, seq_lens)[source]#
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type

  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64

Returns:

tuple (z, new_seq_lens), where z is of shape (batch, max_new_time), max_new_time = max(new_seq_lens), seq_lens is of shape (batch,).

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.util.basic.ctc_greedy_decode(logits, seq_lens, time_major)[source]#

Similar to tf.nn.ctc_greedy_decoder(), but simpler implementation, and should run on GPU.

Parameters:
  • logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim)

  • seq_lens (tf.Tensor) – shape (batch,) of int32|int64

  • time_major (bool) –

Return type:

tf.SparseTensor

Returns:

in batch-major, [batch,max_time] (like tf.nn.ctc_greedy_decoder())

returnn.tf.util.basic.get_common_shape(values, ignore_axes=(), allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>)[source]#

Related: tf.broadcast_dynamic_shape(). Also see unbroadcast_to_common_shape().

Parameters:
  • values (list[tf.Tensor|tf.Variable|float|int]) – all must have the same ndim

  • ignore_axes (list[int]|tuple[int]) – these axes will be ignored (returned dim will be None)

  • allow_broadcast_all_sources (bool|NotSpecified) –

Returns:

common shape of all values. broadcasts dims with 1. will use static dims when possible. Dim of axes which are in ignore_axes will be None.

Return type:

list[tf.Tensor|int|None]

returnn.tf.util.basic.unbroadcast_to_common_shape(value, common_shape, ignore_axes=(), allow_only_noop=False)[source]#
Parameters:
  • value (tf.Tensor|T) –

  • common_shape (list[tf.Tensor|int|None]) – see get_common_shape()

  • ignore_axes (list[int]|tuple[int]) –

  • allow_only_noop (bool) – if False, and the unbroadcast is not a no-op, will raise an exception

Returns:

(maybe) unbroadcasted value

Return type:

tf.Tensor|T

returnn.tf.util.basic.concat_with_opt_broadcast(values, allow_broadcast, axis, name='concat_with_opt_broadcast')[source]#
Parameters:
  • values (list[tf.Tensor]) – all with same ndim

  • allow_broadcast (list[bool]) – same len as values

  • axis (int) –

  • name (str) –

Returns:

basically tf.concat(values, axis), but we can allow broadcasting for some values

Return type:

tf.Tensor

returnn.tf.util.basic.matrix_triangular(shape, dtype=tf.float32, lower=False, upper=False)[source]#
Parameters:
  • shape (tuple[int|tf.Tensor]|tf.Tensor) –

  • dtype (tf.DType) –

  • lower (bool) –

  • upper (bool) –

Return type:

tf.Tensor

class returnn.tf.util.basic.VariableAssigner(var)[source]#

Object helper to assign some var. (This is mostly obsolete now.)

Parameters:

var (tf.Variable) –

assign(value, session)[source]#
Parameters:
  • value (numpy.ndarray|int|float|list[str]) –

  • session (tf.compat.v1.Session) –

returnn.tf.util.basic.get_tf_gcc_version()[source]#
Returns:

gcc version, e.g. “4.8.5”

Return type:

str|None

returnn.tf.util.basic.get_tf_gcc_path()[source]#
Returns:

path to a GCC version which is most suitable for TF (to have correct C++ ABI)

Return type:

str

returnn.tf.util.basic.get_tf_gpp_path()[source]#
Returns:

path to a G++ version which is most suitable for TF (to have correct C++ ABI)

Return type:

str

class returnn.tf.util.basic.CudaEnv[source]#

Information about the Nvidia CUDA environment, and library. Also path to nvcc, the CUDA compiler.

verbose_find_cuda = False[source]#
is_available()[source]#
Return type:

bool

get_max_compute_capability()[source]#
Returns:

the highest compute capability supported by nvcc, or float(“inf”) if not known

Return type:

float

get_compiler_opts()[source]#
Return type:

list[str]

get_compiler_bin()[source]#
Returns:

path

Return type:

str

classmethod get_instance()[source]#
Return type:

CudaEnv

class returnn.tf.util.basic.OpCodeCompiler(use_cuda_if_available=True, cuda_auto_min_compute_capability=True, include_paths=(), ld_flags=(), c_macro_defines=None, **kwargs)[source]#

Helper class to compile TF ops on-the-fly, similar to Theano. https://www.tensorflow.org/guide/extend/op https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/extend/adding_an_op.md

Parameters:
  • base_name (str) – base name for the module, e.g. “zero_out”

  • code_version (int|tuple[int]) – check for the cache whether to reuse

  • code (str) – the source code itself

  • is_cpp (bool) – if False, C is assumed

  • c_macro_defines (dict[str,str|int|None]|None) – e.g. {“TENSORFLOW”: 1}

  • ld_flags (list[str]|None) – e.g. [“-lblas”]

  • include_paths (list[str]|tuple[str]) –

  • include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.

  • static_version_name (str|None) – normally, we use …/base_name/hash as the dir but this would use …/base_name/static_version_name.

  • should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)

  • should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.

  • log_stream (TextIO|None) – file stream for print statements

  • verbose (bool) – be slightly more verbose

CacheDirName = 'returnn_tf_cache/ops'[source]#
classmethod cuda_available()[source]#
Returns:

whether CUDA is available. if True, and you initiate with use_cuda_if_available=True, then _with_cuda() should also be True.

Return type:

bool

load_tf_module()[source]#
Returns:

module

class returnn.tf.util.basic.TFNativeUtilCompiler(include_paths=(), ld_flags=(), c_macro_defines=None, **kwargs)[source]#

Helper class to compile TF utility functions on-the-fly.

Parameters:
  • base_name (str) – base name for the module, e.g. “zero_out”

  • code_version (int|tuple[int]) – check for the cache whether to reuse

  • code (str) – the source code itself

  • is_cpp (bool) – if False, C is assumed

  • c_macro_defines (dict[str,str|int|None]|None) – e.g. {“TENSORFLOW”: 1}

  • ld_flags (list[str]|None) – e.g. [“-lblas”]

  • include_paths (list[str]|tuple[str]) –

  • include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.

  • static_version_name (str|None) – normally, we use …/base_name/hash as the dir but this would use …/base_name/static_version_name.

  • should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)

  • should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.

  • log_stream (TextIO|None) – file stream for print statements

  • verbose (bool) – be slightly more verbose

CacheDirName = 'returnn_tf_cache/tf_utils'[source]#
returnn.tf.util.basic.make_var_tuple(v)[source]#
Parameters:

v (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor]) –

Returns:

tuple of tensors

Return type:

tuple[tf.Tensor]

returnn.tf.util.basic.add_scaled_noise_to_gradients(grads_and_vars, gradient_noise_scale, sparse_grads=False)[source]#

Adds scaled noise from a 0-mean normal distribution to gradients. Adapted from tf.contrib.layers.optimizers.

Parameters:
  • grads_and_vars (list[(tf.Tensor|tf.IndexedSlices, tf.Variable)]) –

  • gradient_noise_scale (float) – used as stddev for tf.truncated_normal().

  • sparse_grads (bool) – for sparse gradients (tf.IndexedSlices), it will only add the noise to the indexed values. Seems broken in some cases? Needs debugging.

Returns:

adapted grads_and_vars

Return type:

list[(tf.Tensor|tf.IndexedSlices, tf.Variable)]

class returnn.tf.util.basic.CustomGradient[source]#

Utility functions to specify a custom gradient for a given function, which will be wrapped around via TF Defun().

Also see FlipGradientBuilder.

register(input_types, op, grad_op, name=None)[source]#
Parameters:
  • input_types (list[tf.DType]|tuple[tf.DType]) –

  • op (((tf.Tensor) -> tf.Tensor)|T) –

  • grad_op ((tf.Operation, tf.Tensor) -> tuple[tf.Tensor]|tf.Tensor) – args are (op, out_grad) and it must return in_grad

  • name (str) – optional func_name

Returns:

op

Return type:

((tf.Tensor) -> tf.Tensor)|T

register_generic_loss_and_error_signal()[source]#

If you want to use generic_loss_and_error_signal() at some point, call this as early as possible, because of https://github.com/tensorflow/tensorflow/issues/6804.

generic_loss_and_error_signal(loss, x, grad_x)[source]#

Wrapper around self.register(). Expects that loss = loss(x), and grad_x = partial loss / partial x.

Parameters:
  • loss (tf.Tensor) –

  • x (tf.Tensor) –

  • grad_x (tf.Tensor) –

Returns:

loss but with the gradient for x

Return type:

tf.Tensor

class returnn.tf.util.basic.MetaLosses[source]#

This provides a way to use an alternative gradient, or to use the original gradient (error signal) and do something with it. You can then define an additional (meta) loss using this.

This implements synthetic gradients, see synthetic_gradient().

class LossInfo(value, scale, norm_factor, name, source)[source]#

Covers loss and other info.

Parameters:
  • value (tf.Tensor) –

  • scale (float) –

  • norm_factor (tf.Tensor) –

  • name (str) –

  • source (object) – e.g. layer

class Scope[source]#

Defines the scope for a synthetic gradient. Create this object via MetaLosses.enter_gradient_scope(). Any meta-losses will be collected here via register_loss().

register_loss(loss)[source]#
Parameters:

loss (MetaLosses.LossInfo) –

exit()[source]#

Exit the scope.

losses_as_fetch_dict()[source]#
Return type:

dict[str,tf.Tensor]

summed_loss_for_optimization()[source]#
Return type:

tf.Tensor

class ScopeCtxThreadLocal[source]#

Thread local.

scope: Scope | None = None[source]#
scope_ctx = <returnn.tf.util.basic.MetaLosses.ScopeCtxThreadLocal object>[source]#
classmethod enter_gradient_scope()[source]#
Return type:

MetaLosses.Scope

classmethod exit_gradient_scope()[source]#

Exit gradient scope.

classmethod synthetic_gradient(x, synthetic_grad_x, loss_scale=1.0, loss_name=None, loss_source=None)[source]#

Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343

Parameters:
  • x (tf.Tensor) –

  • synthetic_grad_x (tf.Tensor) –

  • loss_scale (float) –

  • loss_name (str|None) –

  • loss_source (object|None) –

Returns:

x, where the gradient is overwritten by synthetic_grad_x, and when calculated, the gradient prediction loss will be added to cls.scope.

Return type:

tf.Tensor

classmethod tikhonov_regularized(x, dummy, loss_scale=1.0, loss_name=None, loss_source=None)[source]#
Parameters:
  • x (tf.Tensor) –

  • dummy (tf.Tensor|tf.Variable) – scalar. can be used to enforce getting a gradient

  • loss_scale (float) –

  • loss_name (str|None) –

  • loss_source (object|None) –

Returns:

identity(x), where we add a Tikhonov regularization

Return type:

tf.Tensor

returnn.tf.util.basic.filter_grad(x, threshold, axis)[source]#
Parameters:
  • x (tf.Tensor) –

  • threshold (float) – all grads going through x which max(grad**2) is over the threshold are removed

  • axis (int|list[int]) – max(grad**2) will be reduced over this axis

Returns:

identity(x) with custom gradient

Return type:

tf.Tensor

returnn.tf.util.basic.debug_register_better_repr()[source]#

Some types don’t have good __repr__ implementations by default (for the current TF version). For debugging, it can be helpful to give some more info. This monkey-patches clazz.__repr__ of some TF classes.

returnn.tf.util.basic.cond(pred, true_fn, false_fn, name=None)[source]#

This is a wrapper around tf.cond(). This will be a branched execution, i.e. either fn1() or fn2() will be executed, or at least the resulting graph will be evaluated. If pred can is constant at the call, only the corresponding fn will be called. This is similar to the TF internal _smart_cond(). And similar to tf.contrib.framework.smart_cond.

Parameters:
  • pred (tf.Tensor|bool) –

  • true_fn (()->(tf.Tensor|list[tf.Tensor]|T)) –

  • false_fn (()->(tf.Tensor|list[tf.Tensor]|T)) –

  • name (str) –

Returns:

fn1() if pred else fn2()

Return type:

tf.Tensor|list[tf.Tensor]|T

returnn.tf.util.basic.single_strided_slice(x, axis, begin=None, end=None, step=None)[source]#
Parameters:
  • x (tf.Tensor) –

  • axis (int|tf.Tensor) –

  • begin (int|tf.Tensor|None) –

  • end (int|tf.Tensor|None) –

  • step (int|tf.Tensor|None) –

Returns:

e.g. if axis == 0, returns x[begin:end:step], if axis == 1, returns x[:, begin:end:step], etc.

Return type:

tf.Tensor

returnn.tf.util.basic.pad_replicate(x, axes, padding)[source]#
Parameters:
  • x (tf.Tensor) –

  • axes (list[int]) –

  • padding (list[(int,int)]) –

Return type:

tf.Tensor

returnn.tf.util.basic.circular_pad(x, paddings, axes=None)[source]#
Parameters:
  • x (tf.Tensor) – shape (…, height, width)

  • paddings (int|((int,int), (int,int))|tf.Tensor) – how much to add ((top,bottom),(left,right))

  • axes (None|tf.Tensor|(tf.Tensor|int,tf.Tensor|int)) –

Returns:

tensor with shape (…, top + height + bottom, left + width + right)

Return type:

tf.Tensor

returnn.tf.util.basic.spatial_smoothing_energy(x, dim, use_circular_conv=True)[source]#
Parameters:
  • x (tf.Tensor) – shape (…, dim)

  • dim (int) – last dimension of x

  • use_circular_conv (bool) – whether to use circular convolution, via circular_pad

Return type:

tf.Tensor

Returns:

energy of shape (…)

Via: Achieving Human Parity in Conversational Speech Recognition, Microsoft, 2017 (https://arxiv.org/abs/1610.05256). Interpret the last dimension as 2D (w, h) and apply some high-pass filter on it.

returnn.tf.util.basic.nan_to_num(x, nan_num=0, inf_num=1e+30)[source]#

Like numpy.nan_to_num().

Parameters:
  • x (tf.Tensor|tf.IndexedSlices) –

  • nan_num (float|tf.Tensor) –

  • inf_num (float|tf.Tensor) –

Returns:

x with replaced nan and inf

returnn.tf.util.basic.where_bc(condition, x, y, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, name='where_bc')[source]#

This is basically tf.where() but with additional broadcasting support. We explicitly require that the ndims match (or x, y can also be scalars). See also get_common_shape() and unbroadcast_to_common_shape().

https://github.com/tensorflow/tensorflow/issues/3945 https://github.com/tensorflow/tensorflow/issues/9284

Parameters:
  • condition (tf.Tensor) –

  • x (tf.Tensor|float|int) –

  • y (tf.Tensor|float|int) –

  • allow_broadcast_all_sources (bool|NotSpecified) –

  • name (str) –

Returns:

basically tf.where(condition, x, y)

Return type:

tf.Tensor

returnn.tf.util.basic.identity_op_nested(x, name='identity')[source]#
Parameters:
  • x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]) –

  • name (str) –

:rtype tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]

returnn.tf.util.basic.nd_indices(indices, batch_axis=0, indices_batch_major=None)[source]#
Parameters:
  • indices (tf.Tensor) – e.g. (batch, …) -> index (or (…, batch, …) -> index)

  • batch_axis (int) – of the indices tensor itself

  • indices_batch_major (bool|None) – of the resulting 2-tuple, whether it represents (batch_idx, index) or (index, batch_idx). default is like batch_axis

Returns:

extended indices with batch-idx which can be used for tf.gather_nd, i.e. in the example of shape (batch, …, 2) where the 2-tuple represents (batch_idx, index) or (index, batch_idx). the shape[:-1] is exactly the same as the indices shape.

Return type:

tf.Tensor

returnn.tf.util.basic.stop_all_event_writer_threads()[source]#

Iterates through all running threads, and stops those which are TF event logger threads. See stop_event_writer_thread().

returnn.tf.util.basic.stop_event_writer_thread(event_writer)[source]#

There is a bug in TensorFlow (at least 1.1.0) (https://github.com/tensorflow/tensorflow/issues/4820) that the event writer thread is never stopped. This will try to stop it. Only do it if you don’t use the event writer anymore.

Parameters:

event_writer (tf.compat.v1.summary.FileWriter|tensorflow.python.summary.writer.event_file_writer.EventFileWriter|tensorflow.python.summary.writer.event_file_writer._EventLoggerThread) – # nopep8

returnn.tf.util.basic.optional_add(*args)[source]#
Parameters:

args (list[tf.Tensor|None]|int|float|tf.Tensor) –

Return type:

tf.Tensor|int|float|None

Returns:

sums all non-None values, or returns None if there are none

returnn.tf.util.basic.optional_mul(*args)[source]#
Parameters:

args (tf.Tensor|None|int|float) –

Return type:

tf.Tensor|int|float|None

Returns:

multiplies all non-None values, or returns None if there are none

returnn.tf.util.basic.opt_logical_and(*args)[source]#
Parameters:

args (tf.Tensor|bool) –

Returns:

basically logical_and(*args), but leaves out all constants

Return type:

tf.Tensor|bool

returnn.tf.util.basic.opt_logical_or(*args)[source]#
Parameters:

args (tf.Tensor|bool) –

Returns:

basically logical_or(*args), but leaves out all constants

Return type:

tf.Tensor|bool

returnn.tf.util.basic.windowed_nd(source, window_size, window_left=None, window_right=None, padding='same', time_axis=1, new_window_axis=2, stride=1)[source]#

Constructs a new “window” axis which is a moving input over the time-axis. If you want to take out a single window, i.e. a slice, see slice_nd().

The windowing logic behaves just as in convolution or pooling.

There are multiple implementations:

  • By tiling + padding and then reshaping, we can get what we want. This is the “clever” implementation which is efficient but difficult to understand. To really understand it, it’s best to visualize it. This is the default implementation. It is only efficient with no striding (stride=1), so we only use it for that case.

  • We can do with tf.gather() by calculating the exact indices in the input tensor for all windows. This is quite straight-forward and still reasonably efficient. We use this for striding.

  • tf.image.extract_patches() is quite similar in behavior.

  • We also have native implementations for chunk and unchunk, which are also similar in behavior.

  • PyTorch unfold is also similar in behavior.

Parameters:
  • source (tf.Tensor) – N-D tensor of shape (…, n_time, …)

  • window_size (int|tf.Tensor) – window size

  • window_left (int|tf.Tensor|None) –

  • window_right (int|tf.Tensor|None) –

  • padding (str) – “same” or “valid”

  • time_axis (int) –

  • new_window_axis (int) –

  • stride (int) – return only each Nth windwow

Returns:

tensor of shape (…, n_time, …, window, …)

Return type:

tf.Tensor

returnn.tf.util.basic.slice_nd(x, start, size)[source]#
Parameters:
  • x (tf.Tensor) – shape (B, T, …)

  • start (tf.Tensor) – shape (B,), int32

  • size (int|tf.Tensor) – scalar

Returns:

[x[start_1:size], x[start_2:size], …, x[start_B:size]], shape (B, size, …) Like slice_pad_zeros(), the size in the first axis will always be size, and we will pad with zeros.

Return type:

tf.Tensor

returnn.tf.util.basic.global_tensor(f, name)[source]#

This creates a global accessible tensor in the graph to be reused later, i.e. on the second call given a unique name, it will not create a new tensor but return the previously created tensor. This is for the current graph, i.e. if there is a new graph, it will recreate the tensor.

Parameters:
  • f (() -> tf.Tensor) – callable which creates the tensor

  • name (str) – global reference name for the tensor. should be a valid scope name

Returns:

the tensor

Return type:

tf.Tensor

returnn.tf.util.basic.get_global_train_flag_placeholder()[source]#

Also consider TFNetwork.get_current_network().train_flag(), or get_global_train_flag().

Returns:

bool scalar tensor

Return type:

tf.Tensor

returnn.tf.util.basic.get_global_train_flag()[source]#
Return type:

tf.Tensor|bool

Returns:

global train flag

returnn.tf.util.basic.cond_on_train_flag(fn_train, fn_eval)[source]#

Uses fn_train() or fn_eval() base on train_flag. It will be a branched evaluation. train_flag is determined via get_global_train_flag().

Parameters:
  • fn_train (()->tf.Tensor) –

  • fn_eval (()->tf.Tensor) –

Returns:

fn_train() if self.train_flag else fn_eval()

Return type:

tf.Tensor

returnn.tf.util.basic.get_random_seed()[source]#
Return type:

int|None

returnn.tf.util.basic.get_global_random_generator(*, create: bool = True) Generator | None[source]#
Parameters:

create – if True and no generator exists yet, it will create one

Returns:

random generator

class returnn.tf.util.basic.StatelessRandomSeed(_shape: Tensor | Sequence[int | Tensor], _key: Tensor, _counter: Tensor, _algorithm: int | Tensor)[source]#

State to create some random numbers.

The random numbers can be created multiple times, and it will always return the same value for the same instance. This is useful to save memory.

classmethod create(*, shape: Tensor | Sequence[int | Tensor], generator: Generator | None = None) StatelessRandomSeed[source]#
Parameters:
  • shape

  • generator

Returns:

new instance

uniform(*, minval: float | Tensor = 0, maxval: float | Tensor | None = None, dtype: DType = tf.float32) Tensor[source]#

Basically copy of tf.random.Generator.uniform.

Parameters:
  • minval

  • maxval

  • dtype

Returns:

random tensor with given shape. Note that this op is deterministic, i.e. it will always return the same value for multiple calls on the same instance, as the instance encapsulates all random state.

normal(mean: float | Tensor = 0.0, stddev: float | Tensor = 1.0, dtype: DType = tf.float32) Tensor[source]#

Basically copy of tf.random.Generator.normal.

Parameters:
  • mean

  • stddev

  • dtype

Returns:

random tensor with given shape. Note that this op is deterministic, i.e. it will always return the same value for multiple calls on the same instance, as the instance encapsulates all random state.

returnn.tf.util.basic.encode_raw(x, axis=-1, seq_lens=None)[source]#

The inverse function of tf.compat.v1.decode_raw(). Also see: https://stackoverflow.com/questions/43403147/how-to-create-a-encode-raw-tensorflow-function

Parameters:
  • x (tf.Tensor) – of integer types [0,255], will get casted to uint8

  • axis (int) – the axis to reduce-join the string. decode_raw has added it at the end

  • seq_lens (tf.Tensor|None) – must have same shape as x after reduce-joining. Note that using seq_lens will make our output not compatible with tf.compat.v1.decode_raw() anymore because tf.compat.v1.decode_raw() requires all strings to be of the same length.

Returns:

string tensor

Return type:

tf.Tensor

returnn.tf.util.basic.get_shared_vocab(vocab_strings)[source]#

The vocab is shared across the current instance of the computation graph. The tensor name might be different in different runs.

Parameters:

vocab_strings (list[str]) –

Returns:

shape (len(vocab_strings),), tf.string

Return type:

tf.Tensor

returnn.tf.util.basic.map_labels(x, label_map, name='map_labels')[source]#
Parameters:
  • x (tf.Tensor|tf.SparseTensor) – values of integer types

  • label_map (dict[int,int|None]) – should be dense on input

  • name (str) –

Returns:

mapped values

Return type:

tf.Tensor|tf.SparseTensor

returnn.tf.util.basic.remove_labels(x, labels)[source]#
Parameters:
  • x (tf.SparseTensor) – sequences, i.e. the indices are interpret as (batch,time)

  • labels (set[int]|list[int]) –

Returns:

x where all provided labels are removed, and the indices are changed accordingly

Return type:

tf.SparseTensor

returnn.tf.util.basic.pad_zeros_in_axis(x, before=0, after=0, axis=0)[source]#
Parameters:
  • x (tf.Tensor) –

  • before (int|tf.Tensor) –

  • after (int|tf.Tensor) –

  • axis (int) –

Returns:

returnn.tf.util.basic.slice_pad_zeros(x, begin, end, axis=0)[source]#
Parameters:
  • x (tf.Tensor) – of shape (…, time, …)

  • begin (int|tf.Tensor) –

  • end (int|tf.Tensor) –

  • axis (int) –

Returns:

basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.

Return type:

tf.Tensor

returnn.tf.util.basic.post_control_dependencies(x, updates)[source]#
Parameters:
  • x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]) –

  • updates (list[tf.Operation]) –

Returns:

identity(x) with control_dependencies(updates)

Return type:

tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]

returnn.tf.util.basic.sequential_control_dependencies(ls)[source]#

tf.control_dependencies but each operation will be created such that it is executed after the ones coming before in the list, i.e. l[0] is executed first, l[-1] is executed last.

Parameters:

ls (list[()->(tf.Operation|tf.Tensor)]) –

returnn.tf.util.basic.global_queue(name, queue_type, capacity, dtypes, shapes=None, names=None)[source]#
Parameters:
  • name (str) – global name

  • queue_type ((...)->tf.QueueBase) – some function which creates a queue

  • capacity

  • dtypes (list[tf.DType|str]) –

  • shapes (list[tf.TensorShape|tuple[int|None]]|None) –

  • names (list[str]|None) –

Return type:

tf.QueueBase

returnn.tf.util.basic.init_variable_if_needed(v)[source]#
Parameters:

v (tf.Variable) –

Return type:

tf.Operation

returnn.tf.util.basic.auto_init_var(v)[source]#
Parameters:

v (tf.Variable) –

Returns:

a reference to the var via tf.identity

Return type:

tf.Tensor

returnn.tf.util.basic.true_once()[source]#
Returns:

tensor which will be True once and then always False Internally, this creates a non-trainable variable as a helper.

Return type:

tf.Tensor

returnn.tf.util.basic.raise_OutOfRangeError()[source]#
Returns:

an op which raises an OutOfRangeError

Return type:

tf.Operation

returnn.tf.util.basic.enforce_copy(x)[source]#
Parameters:

x (tf.Tensor|tf.Variable) –

Returns:

copy of input, i.e. enforces that this is not a ref

Return type:

tf.Tensor

returnn.tf.util.basic.zeros_dyn_shape(shape, dtype=tf.float32, name='zeros_dyn_shape')[source]#
Parameters:
  • shape (list[int|None]|tuple[int|None]) –

  • dtype (str|tf.DType) –

  • name (str) –

Returns:

zeros = tf.zeros() which has 1 at the None dims, however, this is a dynamic size, so zeros.shape.as_list() returns exactly shape, including the None’s.

Return type:

tf.Tensor

returnn.tf.util.basic.view_as(x, dtype)[source]#

Does the numpy.view equivalent. Note that the current implementation is inefficient (uses tf.compat.v1.py_func) and CPU-only. Also see tf.bitcast().

Parameters:
  • x (tf.Tensor) –

  • dtype (tf.DType) –

Returns:

x.view(dtype) equivalent (see numpy.view)

returnn.tf.util.basic.broadcast_gradient_args(shape_x, shape_y)[source]#
Parameters:
  • shape_x (tf.Tensor) –

  • shape_y (tf.Tensor) –

Returns:

(axis reduce arg for grad x, axis reduce arg for grad y)

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.util.basic.maximum_with_identity_grad(x, y)[source]#
Parameters:
  • x (tf.Tensor) –

  • y (tf.Tensor|float) –

Returns:

tf.maximum(x, y) where each will receive the gradient

Return type:

tf.Tensor

returnn.tf.util.basic.minimum_with_identity_grad(x, y)[source]#
Parameters:
  • x (tf.Tensor) –

  • y (tf.Tensor|float) –

Returns:

tf.minimum(x, y) where each will receive the gradient

Return type:

tf.Tensor

returnn.tf.util.basic.clip_by_value_with_identity_grad(x, clip_value_min, clip_value_max)[source]#
Parameters:
  • x (tf.Tensor) –

  • clip_value_min (tf.Tensor|float) –

  • clip_value_max (tf.Tensor|float) –

Returns:

tf.clip_by_value(x, clip_value_min, clip_value_max) where each will receive the gradient

Return type:

tf.Tensor

returnn.tf.util.basic.safe_log(x, eps=1e-20, use_fake_grad=True)[source]#

Safe wrapper around tf.log() which avoids infs or nans in the gradient.

Parameters:
  • x (tf.Tensor) –

  • eps (float|tf.Tensor) –

  • use_fake_grad (bool) – True -> use maximum_with_identity_grad, False -> use tf.maximum

Returns:

log(max(x, eps))

Return type:

tf.Tensor

returnn.tf.util.basic.safe_exp(x, eps=1e-20)[source]#
Parameters:
  • x (tf.Tensor) –

  • eps (float) –

Returns:

exp(x), but does clipping before, such that it never returns inf nor exactly 0.0. Also, we make sure that we use the gradient in all cases.

Return type:

tf.Tensor

returnn.tf.util.basic.l1_normalized(x, axis=-1, eps=1e-20, use_logsumexp=False, is_not_negative=False)[source]#
Parameters:
  • x (tf.Tensor) – assumes != 0

  • axis (int|tf.Tensor) – in range [-rank(x),rank(x)]

  • eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps

  • use_logsumexp (bool) – eps must not be None

  • is_not_negative (bool) –

Returns:

y such that tf.reduce_sum(tf.abs(y)) == 1. i.e. y = x / tf.reduce_sum(tf.abs(x)).

Return type:

tf.Tensor

returnn.tf.util.basic.lin_exp(x, use_safe_exp=True)[source]#
Parameters:
  • x (tf.Tensor) –

  • use_safe_exp (bool) –

Returns:

x + 1 if x >= 0 else exp(x). this is smooth and differentiable everywhere

Return type:

tf.Tensor

returnn.tf.util.basic.lin_exp_normed(x, axis=-1, eps=1e-10)[source]#

This can be used as an alternative to softmax. It uses lin_exp() instead of exp.

Parameters:
  • x (tf.Tensor) –

  • axis (int|tf.Tensor) – in range [-rank(x),rank(x)]

  • eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps

Returns:

y = l1_normalized(lin_exp(x)), i.e. tf.reduce_sum(y) == 1, and y >= 0.

Return type:

tf.Tensor

returnn.tf.util.basic.check_base_op_type_and_replace(x, op_type, new_op_type)[source]#

Suppose you have x = tf.nn.softmax(z) and you want to get y = tf.nn.log_softmax(z). This function will test to see if x is of that kind and then return y.

Parameters:
  • x (tf.Tensor) –

  • op_type (str) – e.g. “Softmax”

  • new_op_type (str) – e.g. “LogSoftmax”

Returns:

x with new_op_type instead of op_type, or None if not matched

Return type:

tf.Tensor|None

returnn.tf.util.basic.copy_op(op: Operation, *, graph: Graph | None = None, op_type: str | None = None, inputs: Sequence[Tensor] | None = None, name: str | None = None) Operation[source]#

Copies a tf.Operation.

Parameters:
  • op

  • graph – if given, overwrites op.graph, otherwise uses the same op.graph

  • op_type – if given, overwrites op.type, otherwise uses the same op.type

  • inputs – if given, overwrites op.inputs, otherwise uses the same op.inputs

  • name

Returns:

copy of op but optionally change op.type == op_type or op.inputs == inputs

returnn.tf.util.basic.simplify_neg(a)[source]#
Parameters:

a (T|tf.Tensor|int|float|numpy.ndarray|numpy.number) –

Returns:

-b. but the operation is potentially simplified

Return type:

T|tf.Tensor|int|float|numpy.ndarray|numpy.number

returnn.tf.util.basic.simplify_add(a, b)[source]#
Parameters:
  • a (T|tf.Tensor|int|float|numpy.ndarray|numpy.number) –

  • b (T|tf.Tensor|int|float|numpy.ndarray|numpy.number) –

Returns:

a - b. but the operation is potentially simplified

Return type:

T|tf.Tensor|int|float|numpy.ndarray|numpy.number

Obviously, it is not possible to perform simplification in all cases. So this never can be complete. This just covers some very simple cases, e.g:

(a + b) + (-b) == a

returnn.tf.util.basic.simplify_sub(a, b)[source]#
Parameters:
  • a (T|tf.Tensor|int|float|numpy.ndarray) –

  • b (T|tf.Tensor|int|float|numpy.ndarray) –

Returns:

a - b. but the operation is potentially simplified

Return type:

T|tf.Tensor|int|float|numpy.ndarray

Wraps to simplify_add()

returnn.tf.util.basic.simplify_non_negative_seq_length(x)[source]#
Parameters:

x (tf.Tensor|int|float|numpy.ndarray) –

Returns:

max(x, 0), or simplified if possible

Return type:

tf.Tensor|int|float|numpy.ndarray

returnn.tf.util.basic.copy_tensor(x)[source]#

Similar to tf.identity, but we ensure here that the return value has its own memory. This can be relevant when you want to keep a copy of the original variable value. See get_variable_value_copy_before_update_ops() for usage.

Parameters:

x (tf.Tensor) –

Returns:

a copy of x (points to new memory)

Return type:

tf.Tensor

returnn.tf.util.basic.smoothing_cross_entropy(logits, labels, label_smoothing, gaussian=False, vocab_size=None, logits_are_normalized=False)[source]#

Cross entropy with label smoothing to limit over-confidence. Code adapted from here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_layers.py

Parameters:
  • logits (tf.Tensor) – Tensor of size shape(labels) + [vocab_size]

  • labels (tf.Tensor) – Tensor of size […]

  • vocab_size (int|tf.Tensor) – Tensor representing the size of the vocabulary.

  • label_smoothing (float) –

    confidence = 1.0 - label_smoothing. Used to determine on and off values for label smoothing. If gaussian is true, confidence is the variance to the gaussian distribution. A common default value is 0.1. See:

  • gaussian (bool) – Uses a gaussian distribution for label smoothing

  • logits_are_normalized (bool) –

Returns:

Tensor of the same shape as labels and of the same dtype as logits.

Return type:

tf.Tensor

returnn.tf.util.basic.softmax_cross_entropy_over_size(logits, labels, stable_gradient=True)[source]#

The last spatial axis with dyn size info will be used and interpret as the class probabilities over the size. We will mask logits outside of the size. We expect that the labels have the corresponding invalid frames already set to 0.0. This can be used to measure the cross entropy between two soft alignments / attention weights.

Parameters:
  • logits (Data) – in log space, unscaled. shape (…,T,…). Shape can be eg. (B,dec-T,enc-T,H…), or (dec-T,enc-T,B,H…), etc. If it has multiple axes with dynamic size, we use the last one (enc-T in the example).

  • labels (Data) – in prob space. shape compatible to logits (but axes can be ordered differently). Shape can be e.g. (B,dec-T,enc-T,H…) etc. If is has multiple spatial axes, we expect them to be in the same order as of logits

  • stable_gradient (bool) – whether to use an explicit gradient

Returns:

shape as logits, but the T axis removed.

Return type:

tf.Tensor

returnn.tf.util.basic.interpolate_bilinear(grid, query_points, name='interpolate_bilinear', indexing='ij')[source]#

Similar to Matlab’s interp2 function. Finds values for query points on a grid using bilinear interpolation. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.

Parameters:
  • grid (tf.Tensor) – a 4-D float Tensor of shape [batch, height, width, channels].

  • query_points (tf.Tensor) – a 3-D float Tensor of N points with shape [batch, N, 2]. Note that this function is not differentiable w.r.t. the query points.

  • name (str) – a name for the operation (optional).

  • indexing (str) – whether the query points are specified as row and column (ij), or Cartesian coordinates (xy).

Returns:

a 3-D Tensor with shape [batch, N, channels]

Return type:

tf.Tensor

returnn.tf.util.basic.dense_image_warp(image, flow, name='dense_image_warp')[source]#

Image warping using per-pixel flow vectors. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.

Parameters:
  • image (tf.Tensor) – 4-D float Tensor with shape [batch, height, width, channels].

  • flow (tf.Tensor) – A 4-D float Tensor with shape [batch, height, width, 2]. E.g. via create_random_warp_flow_2d(). Note that this function is not differentiable w.r.t. the flow.

  • name (str) – A name for the operation (optional).

Returns:

A 4-D float Tensor with shape`[batch, height, width, channels]` and same type as input image.

Return type:

tf.Tensor

returnn.tf.util.basic.create_random_warp_flow_2d(shape, std=None, scale=10.0, blur_std=2.0)[source]#

Can be used with dense_image_warp().

Parameters:
  • shape (tf.Tensor|(int,int,int)) – 1D, contains (batch,height,width). e.g. tf.shape(image)[:-1]

  • std (float|(float,float)) –

  • scale (float|(float,float)) –

  • blur_std (float|(float,float)) –

Returns:

[batch, height, width, 2]

Return type:

tf.Tensor

returnn.tf.util.basic.gaussian_kernel_2d(size, std)[source]#
Parameters:
  • size (int|(int,int)) –

  • std (float|(float,float)) –

Returns:

(size_x*2+1,size_y*2+1), float32

Return type:

tf.Tensor

returnn.tf.util.basic.gaussian_blur_2d(image, kernel_size=None, kernel_std=None)[source]#
Parameters:
  • image (tf.Tensor) – (batch,width,height,channel)

  • kernel_size (int|(int,int)|None) –

  • kernel_std (float|(float,float)|None) –

Returns:

image

Return type:

tf.Tensor

returnn.tf.util.basic.bleu_score(hypothesis, truth, hyp_seq_lens, truth_seq_lens)[source]#

Calculates the BLEU score. See Util.compute_bleu(). This currently wraps a Python function and thus is not efficient.

Parameters:
  • hypothesis (tf.Tensor) – (batch, max(hyp_seq_lens))

  • truth (tf.Tensor) – (batch, max(truth_seq_lens))

  • hyp_seq_lens (tf.Tensor) – (batch,)

  • truth_seq_lens (tf.Tensor) – (batch,)

Return type:

tf.Tensor

Returns:

(batch,), float32

returnn.tf.util.basic.prod(ls)[source]#
Parameters:

ls (list[T]|tuple[T]|numpy.ndarray|tf.Tensor) –

Return type:

T|int|float|tf.Tensor

returnn.tf.util.basic.mem_usage_for_dev(dev_name)[source]#
Parameters:

dev_name (str) – e.g. “/device:GPU:0” or “/job:localhost/replica:0/task:0/device:GPU:0”

Returns:

int scalar, which is the peak memory usage in bytes of the given device

Return type:

tf.Tensor

This function will not create multiple nodes in the graph for multiple calls. Currently only works for GPU devices.

returnn.tf.util.basic.identity_with_debug_log(x, args, out, name='DebugLogOp')[source]#
Parameters:
  • x (tf.Tensor) –

  • args (dict[str,tf.Tensor|None]) –

  • out (list[dict[str,numpy.ndarray]]) –

  • name (str) –

Returns:

x

Return type:

tf.Tensor

returnn.tf.util.basic.add_check_numerics_ops(fetches=None, ignore_ops=None, use_check_numerics=True, debug_print_added_checks=True, name='add_check_numerics_ops')[source]#

This is similar to tf.add_check_numerics_ops() and based on similar code. It adds some more logic and options.

Parameters:
  • fetches (list[tf.Operation|tf.Tensor]|None) – in case this is given, will only look at these and dependent ops

  • ignore_ops (list[str]) – e.g. “”

  • use_check_numerics (bool) – if False, instead of tf.check_numerics(), it does the check manually (via tf.is_finite()) and in case there is inf/nan, it will also print the tensor (while tf.check_numerics does not print the tensor). Note that this can be about 50 times slower.

  • debug_print_added_checks (bool) – prints info about each added check

  • name (str) – op-name for the final tf.group

Returns:

operation which performs all the checks

Return type:

tf.Operation

returnn.tf.util.basic.nested_get_shapes(x)[source]#
Parameters:

x (tf.Tensor|dict[str,tf.Tensor]|list[tf.Tensor]|object) – anything that nest supports

Returns:

same structure as x, but tf.TensorShape for each tensor

returnn.tf.util.basic.has_current_control_flow_context()[source]#
Return type:

bool

returnn.tf.util.basic.has_control_flow_context(x)[source]#
Parameters:

x (tf.Tensor|tf.Operation|int|float|None|list[tf.Tensor|tf.Operation|int|float]) –

Returns:

whether x has a control flow, i.e. is e.g. inside a while loop

Return type:

bool

returnn.tf.util.basic.same_control_flow_ctx(x)[source]#

Will use the same (flow) context as x. E.g. if x is a constant, it can be outside the loop, so we will yield a context which is not inside the loop. (This function was earlier called same_context.)

See also default_control_flow_ctx().

Parameters:

x (Tensor|tf.Tensor|tf.Operation|int|float|None|Sequence[Tensor|tf.Tensor|tf.Operation|int|float]) –

Returns:

yields context (via tf.control_dependencies)

returnn.tf.util.basic.op_in_right_control_flow_context(op: Operation) Operation | None[source]#
Parameters:

op – op with some control flow.

Returns:

some op in a control flow context which can be accessed from the current control flow context, or None if there is no such op.

returnn.tf.util.basic.get_protobuf_fields(obj)[source]#
Parameters:

obj – protobuf object

Return type:

dict[str]

returnn.tf.util.basic.get_op_attrib_keys(op)[source]#
Parameters:

op (tf.Operation|tf.Tensor|tf.TensorArray) –

Return type:

list[str]

Returns:

list of attribs. op.get_attr(key) should work

returnn.tf.util.basic.get_op_input_names(op)[source]#

Also see: https://stackoverflow.com/questions/50723310/get-tensorflow-tf-operation-inputs-by-name

Parameters:

op (tf.Operation) –

Returns:

list of names with same len as op.inputs

Return type:

list[str]

returnn.tf.util.basic.get_op_inputs_by_name(op)[source]#
Parameters:

op (tf.Operation) –

Returns:

dict input_name -> input

Return type:

dict[str,tf.Tensor]

returnn.tf.util.basic.tensor_array_is_dynamic_size(ta)[source]#
Parameters:

ta (tf.TensorArray) –

Return type:

bool

returnn.tf.util.basic.tensor_array_is_clear_after_read(ta)[source]#
Parameters:

ta (tf.TensorArray) –

Return type:

bool

returnn.tf.util.basic.tensor_array_element_shape(ta)[source]#
Parameters:

ta (tf.TensorArray) –

Return type:

tf.TensorShape

returnn.tf.util.basic.tensor_array_like(ta, **kwargs)[source]#
Parameters:
  • ta (tf.TensorArray) –

  • kwargs – passed to tf.TensorArray constructor

Returns:

another tensor array, just like ta

Return type:

tf.TensorArray

returnn.tf.util.basic.tensor_array_stack(ta, start=0, stop=None, name='TensorArrayStack')[source]#

Extends tf.TensorArray.stack by start/stop options.

Parameters:
  • ta (tf.TensorArray) –

  • start (int|tf.Tensor) –

  • stop (int|tf.Tensor|None) –

  • name (str) –

Return type:

tf.Tensor

This is mostly a higher-level wrapper around tf.nn.top_k().

Parameters:
  • scores (tf.Tensor) – (batch,beam_in,dim). combined scores (i.e. base beam scores + new scores), dense over the dims, such that we have labels in [0,…,dim-1]. These are supposed to be in +log space, although it just matters here that we take the maximum (or top-k).

  • beam_size (int|tf.Tensor) –

  • keep_beams (bool) – specifies that we keep the beam_in entries, i.e. we just expand, i.e. we just search on the dim. beam_size must be a multiple of beam_in.

  • cheating_gold_targets (tf.Tensor|None) – (batch,), int32

  • cheating_src_beam_idx (tf.Tensor|None) – (batch,), int32. If not given, assumes beam_in - 1. See code below.

  • cheating_exclusive (bool) – make sure that the cheating target does not occur twice, i.e. no duplicates in search tree. This could have happened in our earlier implementation, or if this is disabled.

Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

Returns:

src_beams, labels, beam_scores. src_beams: (batch, beam) -> beam_in idx (int32), labels: (batch, beam) -> dim idx (int32), beam_scores: (batch, beam) -> beam score (float32).

returnn.tf.util.basic.select_src_beams(x, src_beams, name='select_src_beams')[source]#
Parameters:
  • x (tf.Tensor|tf.TensorArray|T) – (batch * src-beam, …)

  • src_beams (tf.Tensor) – (batch, beam) -> src-beam-idx

  • name (str) –

Returns:

(batch * beam, …)

Return type:

tf.Tensor|T

returnn.tf.util.basic.filter_ended_scores(x, end_flags, batch_dim=None, dim=None, score_zero=0.0, score_rem=-1e+30)[source]#

This can e.g. used before tf.nn.top_k to let only one beam through for an ended hypothesis. Then, batch would also include the beam size, which does not matter here.

Parameters:
  • x (tf.Tensor) – (batch, dim)

  • end_flags (tf.Tensor) – (batch,)

  • batch_dim (tf.Tensor|int|None) –

  • dim (tf.Tensor|int|None) –

  • score_zero (float) – x[…, 0] will have this score where end_flag is True

  • score_rem (float) – x[…, 1:] will have this score where end_flag is False

Returns:

filtered x, (batch, dim)

Return type:

tf.Tensor

returnn.tf.util.basic.to_int32_64(x)[source]#
Parameters:

x (tf.Tensor) – dtype uint8, int8, int16, int32, int64

Return type:

tf.Tensor

Returns:

dtype int32 or int64

returnn.tf.util.basic.to_float32(x)[source]#
Parameters:

x (tf.Tensor) –

Returns:

x as float32

Return type:

tf.Tensor

returnn.tf.util.basic.batch_gather(x, indices, keepdims=False)[source]#
Parameters:
  • x (tf.Tensor) – (batch,dim,…)

  • indices (tf.Tensor) – (batch,) -> [0..dim-1]

  • keepdims (bool) –

Returns:

x[batches,indices[batches]], (batch,…). or (batch,1,…) with keep_dims

Return type:

tf.Tensor

returnn.tf.util.basic.unflatten_nd(x, nd_sizes, num_axes=None)[source]#

E.g. assume that for each x[b], we have an image flattened, i.e. of size width*height. Then nd_sizes[b] == (width, height) would provide the individual sizes. We return y such that y[b][i][j] == x[b][i * nd_sizes[b][0] + j]. This is implemented for any number of axes. Kind of like the reverse of a ND version of flatten_with_seq_len_mask.

Parameters:
  • x (tf.Tensor) – (B, T, <Ds>)

  • nd_sizes (tf.Tensor) – (B, N = num_axes)

  • num_axes (int) –

Returns:

(B, T_1, …, T_N, <Ds>), T_i == max(nd_sizes[:, i])

Return type:

tf.Tensor

returnn.tf.util.basic.kernels_registered_for_op(op_name)[source]#

This just wraps the TF C++ function tensorflow::KernelsRegisteredForOp().

Parameters:

op_name (str) – e.g. “Gather”

Returns:

e.g. [“device=’CPU’; …”, “device=’GPU’; …”, …]

Return type:

list[str]

returnn.tf.util.basic.supported_devices_for_op(op_name)[source]#
Parameters:

op_name (str) –

Returns:

list of devices, e.g. [“CPU”, “GPU”]

Return type:

list[str]

returnn.tf.util.basic.find_unsupported_devices_in_graph(graph, dev_name, ignore=None)[source]#
Parameters:
  • graph (tf.Graph) –

  • dev_name (str) – e.g. “GPU”

  • ignore (list[str]|None) – list of op-names to ignore, e.g. [“ScalarSummary”] etc. If None, will use defaults.

Return type:

list[tf.Operation]

returnn.tf.util.basic.get_device_attr(dev)[source]#
Parameters:

dev (str) – eg. “/device:GPU:0”, or any argument for tf.device()

Returns:

scalar string, eg. b’device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1’

Return type:

tf.Tensor

returnn.tf.util.basic.print_graph_output(fetches, file=None, max_depth=None)[source]#
Parameters:
  • fetches (tf.Operation|tf.Tensor|list[tf.Tensor|tf.Operation]) –

  • file (IO[str]|io.TextIOBase|io.StringIO|None) – sys.stdout by default

  • max_depth (int|None) –

returnn.tf.util.basic.format_graph_output(fetches, max_depth=None)[source]#
Parameters:
  • fetches (tf.Operation|tf.Tensor|list[tf.Tensor|tf.Operation]) –

  • max_depth (int|None) –

Return type:

str

returnn.tf.util.basic.var_handle_or_ref(var)[source]#
Parameters:

var (tf.Variable|tensorflow.python.ops.resource_variable_ops.ResourceVariable) –

Return type:

tf.Tensor

returnn.tf.util.basic.find_ops_with_tensor_input(tensors, fetches=None, graph=None)[source]#
Parameters:
  • tensors (tf.Tensor|tf.Variable|list[tf.Tensor]) –

  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None) –

  • graph (tf.Graph|None) –

Returns:

list of ops

Return type:

list[tf.Operation]

returnn.tf.util.basic.find_ops_path_output_to_input(tensors, fetches)[source]#

Searches backwards like in extern.graph_editor.get_backward_walk_ops() and then returns a found traceback, if there is one.

Parameters:
  • tensors (tf.Tensor|tf.Variable|list[tf.Tensor]) – input

  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]) – output

Returns:

list of ops, input to output

Return type:

list[tf.Operation]|None

returnn.tf.util.basic.get_var_update_ops(var, fetches=None)[source]#
Parameters:
  • var (tf.Variable) –

  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None) – e.g. the Optimizer.minimize() op

Returns:

list of ops that update var; currently expected to be of length 1

Return type:

list[tf.Operation]

returnn.tf.util.basic.get_variable_value_copy_before_update_ops(var, update_ops)[source]#
Parameters:
  • var (tf.Variable) –

  • update_ops (list[tf.Operation]) –

Returns:

var value before any of the update_ops are executed

Return type:

tf.Tensor

returnn.tf.util.basic.get_variable_grad_from_update_ops(var, update_ops)[source]#
Parameters:
Returns:

grad of loss w.r.t. var, as it is used in the update_ops, e.g. via ApplyAdam or ApplyGradientDescent (not all kind of updates are supported currently). If the gradient is sparse, it will return a tf.IndexedSlices.

Return type:

tf.Tensor|tf.IndexedSlices

returnn.tf.util.basic.get_variable_from_tensor(var)[source]#
Parameters:

var (tf.Variable|tf.Tensor) –

Returns:

resolve tf.identity or read ops

Return type:

tf.Variable|tf.Tensor

returnn.tf.util.basic.add_control_input(op, control_input)[source]#
Parameters:
  • op (tf.Operation) –

  • control_input (tf.Operation|tf.Tensor) –

returnn.tf.util.basic.vocab_idx_to_vocab_string(labels, vocab)[source]#

Just does a lookup on vocab.

Parameters:
  • labels (tf.Tensor) – (batch,max_len), or any, int32, indices in vocab

  • vocab (tf.Tensor) – (vocab_size,), string

Returns:

(batch,max_len), or any, like labels, string

Return type:

tf.Tensor

returnn.tf.util.basic.vocab_idx_repr(labels, data)[source]#
Parameters:
  • labels (tf.Tensor) – int32, indices in vocab

  • data (Data) – might have vocab

Returns:

string or int32, shape as labels, or maybe without last axis

Return type:

tf.Tensor

returnn.tf.util.basic.string_merge(strings, seq_lens, separator=' ')[source]#

Also see TFEngine.Engine.search().

Parameters:
  • strings (tf.Tensor) – (batch,max_len)

  • seq_lens (tf.Tensor) – (batch,)

  • separator (str|tf.Tensor) – string

Returns:

(batch,), string

Return type:

tf.Tensor

returnn.tf.util.basic.string_replace(strings, old, new, count=-1)[source]#

Like str.replace.

Parameters:
  • strings (tf.Tensor) – (batch,), string

  • old (tf.Tensor|str) –

  • new (tf.Tensor|str) –

  • count (tf.Tensor|int) –

Returns:

(batch,), string

Return type:

tf.Tensor

returnn.tf.util.basic.bpe_merge(strings)[source]#
Parameters:

strings (tf.Tensor) – (batch,), string

Returns:

(batch,), string. strings after BPE merging

Return type:

tf.Tensor

returnn.tf.util.basic.words_split(strings)[source]#

Basically just tf.string_split with delimiter=” “.

Parameters:

strings (tf.Tensor) – (batch,), string

Returns:

sparse tensor of shape (batch,max_len), string

Return type:

tf.SparseTensor

returnn.tf.util.basic.get_sparse_tensor_length(x)[source]#
Parameters:

x (tf.SparseTensor) – of shape prefix + (max_len,), where prefix can be anything, e.g. prefix=(batch,)

Returns:

shape prefix, int64

Return type:

tf.Tensor

returnn.tf.util.basic.string_words_calc_wer(hyps, refs)[source]#

Uses words_split() on hyps and refs, and then tf.edit_distance with normalize=False.

Parameters:
  • hyps (tf.Tensor) – (batch,), dtype string

  • refs (tf.Tensor) – (batch,), dtype string

Returns:

(WER (batch,) unnormalized, num ref words (batch,))

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.util.basic.py_print(pass_through_value, print_args, message=None, summarize=None, first_n=None, name='py_print', file=None)[source]#

Like tf.Print(), but prints to Python stdout or to file. Also see tf.print(), which however also does not print to Python stdout.

Parameters:
  • pass_through_value (tf.Tensor|int|float) – will return tf.identity of this, but with side effect of printing

  • print_args (list[str|tf.Tensor]) –

  • message (str|None) – A string, prefix of the error message.

  • summarize (int|None) – Only print this many entries of each tensor. If None, then a maximum of 3 elements are printed per input tensor.

  • first_n (int|None) – Only log first_n number of times. Negative numbers log always; this is the default.

  • name (str) –

  • file (SupportsWrite[str]|None) – a file-like object (stream); defaults to the current sys.stdout.

Returns:

tf.identity(pass_through_value) with side effect of printing

Return type:

tf.Tensor

returnn.tf.util.basic.get_positional_encoding(num_channels, length=None, position=None, min_timescale=1.0, max_timescale=10000.0)[source]#

Gets a bunch of sinusoids of different frequencies.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.

This allows attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to attention.

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to channels / 2. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the channels dimension.

The code is adapted from Tensor2Tensor get_timing_signal_1d (https://github.com/tensorflow/tensor2tensor).

Parameters:
  • num_channels (int) – scalar, size of timing embeddings to create. The number of different timescales is equal to channels / 2.

  • length (tf.Tensor|int|None) – scalar, length of timing signal sequence.

  • position (tf.Tensor|None) – could be provided directly. int32. Can have any shape, e.g. [length] or [B,len]. If not given, will be tf.range(length), i.e. of shape [length].

  • min_timescale (float) – a float.

  • max_timescale (float) – a float.

Returns:

a Tensor of timing signals of shape position.shape + [num_channels], e.g. [length,num_channels]

Return type:

tf.Tensor

returnn.tf.util.basic.get_linear_alignment_out_to_in_indices(input_lens, output_lens, pad_value=0)[source]#
Parameters:
  • input_lens (tf.Tensor|list[int]) – [B]

  • output_lens (tf.Tensor|list[int]) – [B]

  • pad_value (int) –

Returns:

[B,outT], mapping to input positions [0..input_len-1]. Examples:

  • input_len=7, output_len=3, resulting indices [1,3,5].

  • input_len=3, output_len=3, resulting indices [0,1,2].

  • input_len=2, output_len=4, resulting indices [0,0,1,1].

Return type:

tf.Tensor

returnn.tf.util.basic.get_rnnt_linear_aligned_output(input_lens, targets, target_lens, blank_label_idx, pad_value=0, targets_consume_time=False)[source]#

RNN-T (https://arxiv.org/abs/1211.3711) has an output length of input_lens + target_lens. Here we create a linear alignment. Examples: (B is blank.)

  • input_len=4, targets=[a,b,c] (len 3), output=[B,a,B,b,B,c,B] (len 7).

  • input_len=0, targets=[a,b,c] (len 3), output=[a,b,c] (len 3).

  • input_len=4, targets=[a] (len 1), output=[B,B,a,B,B] (len 5).

  • input_len=3, targets=[a,b] (len 2), output=[B,a,B,b,B] (len 5)

Parameters:
  • input_lens (tf.Tensor|list[int]) – [B], int32. the input (or encoder) lengths

  • targets (tf.Tensor|list[list[int]]) – [B,targetT], int32

  • target_lens (tf.Tensor|list[int]) – [B], int32. the targets length

  • blank_label_idx (int) –

  • pad_value (int) –

  • targets_consume_time (bool) – In the standard RNN-T, the target labels do not consume a time frame. That is why the RNN-T label output length is input_lens + target_lens. In RNA (https://www.isca-speech.org/archive/Interspeech_2017/abstracts/1705.html), each target label consumes a time frame, thus the label output length is just input_lens.

Returns:

output [B,outT], output_lens [B]. The output is basically the target filled with blank in between.

Return type:

(tf.Tensor,tf.Tensor)

returnn.tf.util.basic.get_non_deterministic_ops_from_graph()[source]#

Lists all non deterministic ops used in the default graph If a non deterministic op is used multiple times each instance will be listed

currently doesn’t check if user specified a specific computation device list of non deterministic ops is not jet complete

Returns:

list of all non deterministic ops names (depending on device and tf version) used in current graph

Return type:

list[tf.Operation]

returnn.tf.util.basic.compute_sampled_logits(weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, subtract_log_q=True, remove_accidental_hits=False, partition_strategy='mod', name='compute_sampled_logits', seed=None)[source]#

Helper function for nce_loss and sampled_softmax_loss functions. Computes sampled output training logits and labels suitable for implementing e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see sampled_softmax_loss). Note: In the case where num_true > 1, we assign to each target class the target probability 1 / num_true so that the target probabilities sum to 1 per-example.

This is a copy of

https://github.com/tensorflow/tensorflow/blob/e19c354920c3b246dda6598229210a582caaa1a9/tensorflow/python/ops/nn_impl.py#L1440

Parameters:
  • weights (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor]) – A Tensor of shape [num_classes, dim], or a list of Tensor objects whose concatenation along dimension 0 has shape [num_classes, dim]. The class embeddings.

  • biases (tf.Tensor) – A Tensor of shape [num_classes]. The class biases.

  • labels (tf.Tensor) – A Tensor of type int64 and shape [batch_size, num_true]. The target classes. Note that this format differs from the labels argument of tf.nn.softmax_cross_entropy_with_logits.

  • inputs (tf.Tensor) – A Tensor of shape [batch_size, dim]. The forward activations of the input network.

  • num_sampled (int) – The number of classes to randomly sample per batch.

  • num_classes (int) – The number of possible classes.

  • num_true (int) – The number of target classes per training example.

  • sampled_values ((tf.Tensor, tf.Tensor, tf.Tensor)|None) – a tuple of (sampled_candidates, true_expected_count, sampled_expected_count) returned by a *_candidate_sampler function. (if None, we default to log_uniform_candidate_sampler)

  • subtract_log_q (bool) – whether to subtract the log expected count of the labels in the sample to get the logits of the true labels. Default is True. Turn off for Negative Sampling.

  • remove_accidental_hits (bool) – Whether to remove “accidental hits” where a sampled class equals one of the target classes.

  • partition_strategy (str) – A string specifying the partitioning strategy, relevant if len(weights) > 1. Currently “div” and “mod” are supported. Default is “mod”. See tf.nn.embedding_lookup for more details.

  • name (str|None) – A name for the operation.

  • seed (int|None) – random seed for candidate sampling. Default to None, which doesn’t set the op-level random seed for candidate sampling.

Returns:

out_logits: Tensor object with shape

[batch_size, num_true + num_sampled], for passing to either nn.sigmoid_cross_entropy_with_logits (NCE) or nn.softmax_cross_entropy_with_logits (sampled softmax).

out_targets: A Tensor object with the same shape and dtype as out_logits.

These are the targets. If num_true > 1 the per-example labels are divided by num_true so they sum to 1.0.

Return type:

(tf.Tensor, tf.Tensor)

returnn.tf.util.basic.safe_deep_copy(obj)[source]#
Parameters:

obj (T) –

Returns:

deepcopy of obj, without copying TF types, Python modules, functions/lambdas

Return type:

T

class returnn.tf.util.basic.FetchHelper(tensor, verbose_stream=None)[source]#

session.run(tensor) does not work if tensor is inside a loop (tf.while_loop) (or tf.cond). You would get an error like this:

Operation '...' has been marked as not fetchable.

This class is a helper to work around that. It will add an op to the graph, which stores the most recent value. To get this executed automatically, you likely want to add is as a control dependency to another op. Use add_to_control_inputs() for that, or better copy_graph_replace_tensors(), or better copy_graph().

Parameters:
  • tensor (tf.Tensor) –

  • verbose_stream (IO[str]|None) –

classmethod copy_graph(fetches, target_op, fetch_helper_tensors, stop_at_ts=(), verbose_stream=None)[source]#
Parameters:
  • fetches (tf.Tensor|list[tf.Tensor]|T) –

  • target_op (tf.Operation) – will add the fetch helpers as control dependencies to this op

  • fetch_helper_tensors (list[tf.Tensor]) –

  • verbose_stream (IO[str]|None) –

  • stop_at_ts (Iterable[tf.Tensor]) – iterable of tensors at which the graph walk stops.

Returns:

copied fetches, fetch helpers, transformed target op

Return type:

(tf.Tensor|list[tf.Tensor]|T, list[FetchHelper], tf.Operation)

classmethod copy_graph_replace_tensors(fetches, fetch_helpers)[source]#
Parameters:
  • fetches (tf.Tensor|list[tf.Tensor]) –

  • fetch_helpers (list[FetchHelper]) –

Returns:

as fetches

Return type:

tf.Tensor|list[tf.Tensor]

add_to_control_inputs(other_op)[source]#

Note: This will not work if you already did a session.run. See here. Use copy_graph_replace_tensors() instead. Or better copy_graph().

Parameters:

other_op (tf.Operation) –

returnn.tf.util.basic.is_axis_from_description_recurrent(axis, network, data)[source]#
Parameters:
  • axis (str|Dim) – expected not to be transformed via transform_config_dict or so. So single_step_dim, when moved out of the recurrent loop, is still single_step_dim. We detect this here.

  • network (returnn.tf.network.TFNetwork) –

  • data (Data) –

Return type:

bool

returnn.tf.util.basic.onnx_compat_floor_div(a: Tensor, b: Tensor) Tensor[source]#
Parameters:
  • a

  • b

Returns:

for onnx export compatible floor_divide