returnn.tf.util.basic
¶
Lots of random utility functions for TensorFlow.
- returnn.tf.util.basic.tf_version_tuple()[source]¶
- Returns:
version tuple, e.g. (1, 1, 0), parsed from tf.__version__
- Return type:
tuple[int]
- returnn.tf.util.basic.assert_min_tf_version(version, reason)[source]¶
- Parameters:
version (tuple[int]) – e.g. (1,2,0) or (1,2)
reason (str)
- returnn.tf.util.basic.have_min_tf_version(version)[source]¶
- Parameters:
version (tuple[int]) – e.g. (1,2,0) or (1,2)
- Returns:
True if we have at least that version, or newer
- Return type:
bool
- class returnn.tf.util.basic.CustomUpdate[source]¶
Custom updates will be handled by
TFUpdater
.
- class returnn.tf.util.basic.CustomUpdateExpAverage(average, alpha)[source]¶
exponential moving average
- Parameters:
average (tf.Tensor)
alpha (float)
- returnn.tf.util.basic.set_param_axes_split_info(param, axes_split_info)[source]¶
- Parameters:
param (tf.Variable|tf.Tensor)
axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices
- returnn.tf.util.basic.check_param_axes_split_info(param_shape, axes_split_info)[source]¶
- Parameters:
param_shape (list[int|None]|tuple[int|None])
axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices
- returnn.tf.util.basic.get_param_axes_split_info(param)[source]¶
See
set_param_axes_split_info()
.- Parameters:
param (tf.Variable|tf.Tensor)
- Return type:
list[list[int]|None]|None
- returnn.tf.util.basic.transform_param_axes_split_info_to_new_shape(axes_split_info, new_shape, debug_name='<unknown>')[source]¶
new_shape can be bigger or smaller than the old shape. In some simple cases, it is obvious how that should be done, e.g. [[a],[b]*4], [a*2,b*8] -> [[a*2],[b*2]*4] In some, it is not so. E.g. [[a+b],[b]*4], [a+b*2,b*8] -> [[a+b*2],[b*2]*4].
We should try to always return something, though. If some case is not covered yet, extend this.
See test cases as well,
test_transform_param_axes_split_info_to_new_shape()
. No TF involved here, however, fits better to the functions above.- Parameters:
axes_split_info (list[list[int]])
new_shape (list[int]|tuple[int])
debug_name (str)
- Returns:
new axes-split-info for the new shape
- Return type:
list[list[int]]
- returnn.tf.util.basic.copy_with_new_split_axes(old_axis_splits, new_axis_splits, old_values: ndarray, new_values: ndarray | None = None)[source]¶
On Numpy arrays only, however, fits better to the functions above.
- Parameters:
old_axis_splits (list[list[int]])
new_axis_splits (list[list[int]])
old_values (numpy.ndarray)
new_values (numpy.ndarray)
- Returns:
new values
- Return type:
numpy.ndarray
- returnn.tf.util.basic.get_padding_info_dict_ref(x)[source]¶
- Parameters:
x (tf.Tensor)
- Return type:
dict[Dim,float|int]
- returnn.tf.util.basic.set_padding_info(x, dim, pad_value)[source]¶
Stores the information what kind of padding value to expect after masking in the given dynamic dim.
- Parameters:
x (tf.Tensor)
dim (returnn.tensor.Dim) – dynamic seq len axis
pad_value (float|int)
- returnn.tf.util.basic.copy_compatible_reduce(source, target, reduce_type)[source]¶
Extension of Data.copy_compatible_to which also reduces additional dims.
- Parameters:
source (Data)
target (Data)
reduce_type (str) – eg “max”
- Returns:
source with broadcast-compatible shape to target
- Return type:
Data
- class returnn.tf.util.basic.OutputWithActivation(x, act_func=None, act_func_opts=None)[source]¶
Stores some tensor before and after some activation function, and also the activation function itself. (Maybe obsolete when you directly access the TF computation graph; but simpler.)
- Parameters:
x (tf.Tensor)
act_func (None|(tf.Tensor)->tf.Tensor)
act_func_opts (None|dict[str])
- returnn.tf.util.basic.variable_scalar_summaries_dict(x: Tensor | Variable, name: str | None = None) Dict[str, Tensor] [source]¶
Collects all interesting information about x, such as min/max/mean, etc. (all scalars). This is used by
variable_summaries()
.- Parameters:
x
name
- Returns:
dict with key -> scalar info, e.g. with “%s_mean” % name -> tf.reduce_mean(x)
- returnn.tf.util.basic.variable_summaries(var, name=None, with_histogram=False)[source]¶
Attach a lot of summaries to a Tensor (for TensorBoard visualization). Also see
variable_scalar_summaries_dict()
.- Parameters:
var (tf.Tensor|tf.Variable)
name (str)
with_histogram (bool) – adds histogram. note that this can add noticeable overhead
- Returns:
nothing, use
tf.compat.v1.summary.merge_all()
to collect the summaries
- returnn.tf.util.basic.get_valid_scope_name_from_str(s)[source]¶
- Parameters:
s (str) – some name
- Returns:
valid scope name, might be just s. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX
- Return type:
str
- returnn.tf.util.basic.get_current_var_scope_name()[source]¶
- Returns:
current absolute variable scope name, via tf.compat.v1.variable_scope
- Return type:
str
- returnn.tf.util.basic.get_current_name_scope()[source]¶
- Returns:
current absolute name scope, via tf.name_scope
- Return type:
str
https://stackoverflow.com/questions/40907769/how-to-get-current-tensorflow-name-scope
Note that this is a private member and might break at some point. Note also that this does not need to be the same as get_current_var_scope_name().
- returnn.tf.util.basic.reuse_name_scope(name, absolute=None, **kwargs)[source]¶
Context manager to reuse an already created scope. We try to both set the variable scope and the name scope.
- Parameters:
name (str|tf.compat.v1.VariableScope) – relative or absolute name scope (absolute if absolute=True or if tf.compat.v1.VariableScope). Must not end with “/”.
absolute (bool|None) – if True it will be absolute
kwargs – passed on to tf.compat.v1.variable_scope
- Returns:
yields the variable_scope
- returnn.tf.util.basic.opt_reuse_name_scope(name)[source]¶
- Parameters:
name (str|tf.compat.v1.VariableScope)
- Returns:
yields the variable_scope
- returnn.tf.util.basic.get_name_scope_of_tensor(x)[source]¶
- Parameters:
x (tf.Tensor) – has name e.g. “layer0/rec/W:0”
- Returns:
the name scope of x, e.g. “layer0/rec”
- Return type:
str
- returnn.tf.util.basic.get_base_name(x)[source]¶
- Parameters:
x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”
- Returns:
return the base name, e.g. “W”, without the output index
- returnn.tf.util.basic.reuse_name_scope_of_tensor(x, prefix='', postfix='', add_tensor_name=False)[source]¶
- Parameters:
x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”
prefix (str)
postfix (str)
add_tensor_name (bool)
- Returns:
reuse the name scope of x, e.g. “layer0/rec”, yields scope
- returnn.tf.util.basic.default_control_flow_ctx()[source]¶
This was earlier called
var_creation_scope
.If you create a variable inside of a while-loop, you might get the following error:
InvalidArgumentError: The node ‘while/w/Assign’ has inputs from different frames. The input ‘while/j’ is in frame ‘while/while/’. The input ‘while/w’ is in frame ‘’.
This happens when you directly call
tf.Variable
, because the initial_value might be a tensor which depends on the current control flow context. See tests/test_TFUtil.py:test_loop_var_creation() for an example.Related TF bugs:
One solution is to reset the current control flow context. See also
same_control_flow_ctx()
.However, with respect to variables, you should instead use
tf.get_variable
, which does not have this problem.
- returnn.tf.util.basic.get_root_graph(graph=None)[source]¶
- Parameters:
graph (tf.Graph|None)
- Returns:
root graph. with control flow v2, the current graph might not be the root graph
- Return type:
tf.Graph
- returnn.tf.util.basic.flip_gradient(x, scale=1.0)[source]¶
- Parameters:
x (tf.Tensor)
scale (float)
- Returns:
identity(x) but with flipped gradient (optionally scaled)
- Return type:
tf.Tensor
- returnn.tf.util.basic.lookup_grad_func_by_name(op_type)[source]¶
- Parameters:
op_type (str)
- Returns:
function grad_func(op, grad), or raises LookupError
- returnn.tf.util.basic.opt_register_grad_func(op_type, grad_func, assert_is_same=True)[source]¶
- Parameters:
op_type (str)
grad_func – function grad_func(op, grad)
assert_is_same (bool)
- returnn.tf.util.basic.identity_with_check_numerics(x, with_grad=True, name='identity_with_check_numerics')[source]¶
Returns identity(x), but with additional check_numerics control dependency, and optionally the same for its gradient. See also
TFUpdater.add_check_numerics_ops()
, which will add checks for the whole graph.- Parameters:
x (tf.Tensor)
with_grad (bool) – whether the check will also be added for the gradient
name (str)
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_input_ndim(x, ndim)[source]¶
- Parameters:
x (tf.Tensor)
ndim (int)
- Returns:
x with check added
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_input_ndim_equal_offset(x, y, y_ndim_offset=0)[source]¶
- Parameters:
x (tf.Tensor)
y (tf.Tensor)
y_ndim_offset (int)
- Returns:
x with check added such that ndim(x) == ndim(y) + y_ndim_offset
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_input_dim(x, axis, dim)[source]¶
- Parameters:
x (tf.Tensor)
axis (int) – which axis to check
dim (int|tf.Tensor)
- Returns:
x with check added
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_dim_equal(x, x_axis, y, y_axis, extra_msg=())[source]¶
- Parameters:
x (tf.Tensor)
x_axis (int) – which axis to check
y (tf.Tensor)
y_axis (int) – which axis to check
extra_msg (Sequence[str|tf.Tensor]) – will be printed additionally if it fails
- Returns:
x with check added that shape(x)[x_axis] == shape(y)[y_axis]
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_shape_equal(x, y)[source]¶
- Parameters:
x (tf.Tensor)
y (tf.Tensor)
- Returns:
x with check added that shape(x) == shape(y)
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_shape_dim(x, axis, name='shape_dim')[source]¶
- Parameters:
x (tf.Tensor)
axis (int) – which axis
name (str)
- Returns:
x.shape[axis] either as a static int or otherwise as an expression
- Return type:
int|tf.Tensor
- returnn.tf.util.basic.get_shape(x)[source]¶
- Parameters:
x (tf.Tensor|tf.Variable)
- Returns:
list of scalars, which are either int if known statically, or otherwise expressions
- Return type:
list[int|tf.Tensor]
- returnn.tf.util.basic.get_ndim(x)[source]¶
- Parameters:
x (tf.Tensor)
- Returns:
x.ndim either as a static int or otherwise as an expression
- Return type:
int|tf.Tensor
- returnn.tf.util.basic.get_range(start, stop=<class 'returnn.util.basic.NotSpecified'>)[source]¶
- Parameters:
start (int|tf.Tensor|None)
stop (int|tf.Tensor|None)
- Returns:
either tuple(range(start, stop)) or the same as a symbolic expression
- Return type:
tuple[int]|tf.Tensor
- returnn.tf.util.basic.identity_with_ops(x, ops)[source]¶
- Parameters:
x (tf.Tensor)
ops (() -> list[tf.Operation|tf.Tensor])
- Returns:
x with all ops executed
- Return type:
tf.Tensor
- returnn.tf.util.basic.setup_tf_thread_pools(num_threads=None, log_file=None, tf_session_opts=None)[source]¶
See here for documentation of intra_op_parallelism_threads and inter_op_parallelism_threads: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto
intra_op_parallelism_threads is used for the LocalDevice::EigenThreadPoolInfo, which is always global. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/local_device.cc
inter_op_parallelism_threads is used for the (global if not use_per_session_threads) session thread pool. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc
TF will setup the thread pools on first usage. That can happen quite early, esp for intra_op_parallelism_threads. E.g. list_local_devices() will trigger this, i.e. any call to is_gpu_available() or print_available_devices(). For debugging, you can set the env-var TF_CPP_MIN_VLOG_LEVEL=1 and then check for these message:
Local device intra op parallelism threads: 4 Direct session inter op parallelism threads: 4
Thus, call this function as early as possible with your preferred number of threads, used for both thread pools. It will create a dummy session and directly close it again, but if you use the global thread pools, those settings will remain for further sessions. This function will only execute on the first call.
- Parameters:
num_threads (int) – used for both intra and inter parallelism thread pools
log_file (stream|None)
tf_session_opts (dict[str,Any])
- returnn.tf.util.basic.check_initial_tf_thread_pool_init(tf_session_opts: Dict[str, Any] | None = None)[source]¶
Makes sure that the TF thread pools are initialized with the requested settings. You probably want to call this very early.
- Parameters:
tf_session_opts
- returnn.tf.util.basic.get_tf_list_local_devices(tf_session_opts=None, file=None)[source]¶
This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call
setup_tf_thread_pools()
first. Note that this will list all available devices. Any TF session might only use a subset of these. You can get the list available in a given TF session bytf.compat.v1.Session.list_devices()
.- Parameters:
tf_session_opts (dict[str]|None) – if given, will init a temp tf.compat.v1.Session with these opts
file (TextIO|None) – log_stream stream for print statements, defaults to sys.stdout
- Return type:
list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]
- returnn.tf.util.basic.print_available_devices(tf_session_opts=None, file=None)[source]¶
Prints the available TF devices on file (stdout by default). This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call
setup_tf_thread_pools()
first.- Parameters:
tf_session_opts (dict[str,Any]|None) – if given, will init a temp Session with these opts
file (TextIO|None) – file stream for print statements, defaults to sys.stdout
- returnn.tf.util.basic.is_gpu_available()[source]¶
Returns whether TensorFlow can access a GPU. This uses tensorflow.device_lib.list_local_devices(), i.e. this is independent from the current TF session. If you want to know whether the current TF session has a GPU available, use
is_gpu_available_in_session()
.Note that this does not tell whether the GPU or TF supports CUDA. See
is_tf_cuda_build()
for that.Note that a call to this will trigger the internal TF thread pool inits, so you should call
setup_tf_thread_pools()
first.- Return type:
bool
- returnn.tf.util.basic.is_gpu_available_in_session(session=None)[source]¶
- Parameters:
session (tf.compat.v1.Session|None) – If None, will use current active/default session. If that is also not available (no current active session), we check a RETURNN global config, and return whether the RETURNN global config will use GPU or not. If the RETURNN global config is not available, we will use if a GPU is in general available for TF.
- Returns:
whether the TensorFlow session has a GPU device. Also see
is_gpu_available()
.- Return type:
bool
- returnn.tf.util.basic.get_available_gpu_devices()[source]¶
Returns a list of available GPU devices. This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call
setup_tf_thread_pools()
first.- Return type:
list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]
- returnn.tf.util.basic.get_available_gpu_cuda_min_compute_capability()[source]¶
Uses
get_available_gpu_devices()
.- Returns:
e.g. 3.0, or 5.0, etc, or None
- Return type:
float|None
- returnn.tf.util.basic.is_tf_cuda_build()[source]¶
- Returns:
whether TF was build with CUDA support. also see
is_gpu_available()
- Return type:
bool
- returnn.tf.util.basic.dot(a, b, transpose_b=False)[source]¶
- Parameters:
a (tf.Tensor) – shape […da…,d]
b (tf.Tensor) – shape [d,…db…] (or […db…,d] if transpose_b)
transpose_b (bool)
- Returns:
tensor of shape […da…,…db…]
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_activation_function(s)[source]¶
- Parameters:
s (str|None)
- Return type:
(tf.Tensor) -> tf.Tensor
- returnn.tf.util.basic.gelu(x)[source]¶
Gaussian Error Linear Units (GELUs) (https://arxiv.org/abs/1606.08415). Alternative to relu.
- Parameters:
x (tf.Tensor)
- Return type:
tf.Tensor
- returnn.tf.util.basic.gelu2(x)[source]¶
Another approximation of the GELU (https://github.com/hendrycks/GELUs). Faster but less accurate than gelu (https://github.com/hendrycks/GELUs).
- Parameters:
x (tf.Tensor)
- Return type:
tf.Tensor
- returnn.tf.util.basic.gelu3(x)[source]¶
Another version of the GELU, as it is used in PyTorch https://github.com/pytorch/pytorch/blob/b7f4b6a6de30116f1b44a08fab9499dd5eb2de7d/test/cpp/api/functional.cpp#L839-L845
- Parameters:
x (tf.Tensor)
- Return type:
tf.Tensor
- returnn.tf.util.basic.random_uniform_abs_initializer(limit, **kwargs)[source]¶
- Parameters:
limit (float|int|tf.Tensor)
kwargs – passed to tf.random_uniform_initializer
- Return type:
tensorflow.python.ops.init_ops.Initializer
- returnn.tf.util.basic.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)[source]¶
Alias for tf.glorot_uniform_initializer or tf.glorot_normal_initializer.
- Parameters:
uniform (bool) – uniform or normal distribution
seed (int)
dtype (tf.DType)
- Returns:
((tuple[int]) -> tf.Tensor) | tensorflow.python.ops.init_ops.Initializer
- returnn.tf.util.basic.wrap_distribution_non_zero(x, zero_limit, limit)[source]¶
- Parameters:
x (tf.Tensor) – values in [-limit,limit]
zero_limit (float)
limit (float)
- Returns:
same shape as x. rescale and shifts such that values from [-zero_limit,zero_limit] are excluded. still values are in [-limit,limit].
- Return type:
tf.Tensor
- class returnn.tf.util.basic.VarianceScalingNonZero(non_zero_fraction=0.5, **kwargs)[source]¶
Same as
tf.VarianceScaling
, i.e. truncated normal or uniform from [-limit,limit] for some limit, except that we exclude the range [-limit*non_zero_fraction,limit*non_zero_fraction]. non_zero_fraction=0 would yield no difference.- For reference, to get the behavior of glorot_uniform, use these args:
mode=”fan_avg”, distribution=”uniform”
DEPRECATED FUNCTION ARGUMENT VALUES (deprecated arguments)
Deprecated: SOME ARGUMENTS ARE DEPRECATED: (dtype). They will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor
Deprecated: SOME ARGUMENT VALUES ARE DEPRECATED: (distribution=’normal’). They will be removed in a future version. Instructions for updating: normal is a deprecated alias for truncated_normal
- returnn.tf.util.basic.variance_scaling_non_zero_initializer[source]¶
alias of
VarianceScalingNonZero
- returnn.tf.util.basic.load_txt_file_initializer(filename, dtype=tf.float32)[source]¶
- Parameters:
filename (str)
dtype (tf.DType)
- Returns:
function, when called, will return the content
- Return type:
()->tf.Tensor
- class returnn.tf.util.basic.GammatoneFilterbankInitializer(**kwargs)[source]¶
Initializer for a gammatone filterbank, e.g., to initialize weights of a convolutional layer.
- Parameters:
kwargs – kwargs for GammatoneFilterbank
- returnn.tf.util.basic.get_initializer(s, seed: int | Tensor | None = None, eval_local_ns: Dict[str, Any] | None = None, dtype: DType | str = tf.float32)[source]¶
- Parameters:
s (str|dict[str]|float|numpy.ndarray) – e.g. “glorot_uniform” or “truncated_normal” or “orthogonal”, or config dict with “class”, or string to be `eval`ed if it contains “(”. constant if a float is given.
seed – used in case the initializer has no explicit seed specified.
eval_local_ns
dtype
- Returns:
(function (shape) -> tf.Tensor) | tf.Initializer
- Return type:
((tuple[int]) -> tf.Tensor) | tf.Initializer
- returnn.tf.util.basic.dropout(x, keep_prob, noise_shape=None, seed=None, name=None, cond_on_train=False, apply_correction_factor=True, grad_checkpointing=False)[source]¶
Computes dropout. Like
tf.nn.dropout()
but avoidtf.div()
if possible. If noise_shape is statically known, and x is inside a recurrent loop, we will reuse the same mask for all frames.- Parameters:
x (tf.Tensor)
keep_prob (float|tf.Tensor)
noise_shape (tf.Tensor|tuple[int|None]) – 1 will broadcast in that dimension, None will not broadcast
seed (int)
name (str)
cond_on_train (bool) – automatically wrap through
cond_on_train_flag()
apply_correction_factor (bool)
grad_checkpointing (bool) – use gradient checkpointing for the result
- returnn.tf.util.basic.layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]¶
Layer normalization. Also see
openai_layer_norm()
. Also seetensorflow.contrib.layers.layer_norm()
.- Parameters:
x (tf.Tensor)
gain (tf.Tensor)
bias (tf.Tensor)
axis (int)
epsilon (float) – OpenAI uses 1e-6, TF contrib uses 1e-12, pbhatia243 uses 1e-5.
- Return type:
tf.Tensor
- returnn.tf.util.basic.openai_layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]¶
Layer normalization, like
layer_norm()
, but fast kernel by OpenAI (implemented as part of their blocksparse). To use it, init the git submodule in extern/blocksparse.- Parameters:
x (tf.Tensor)
gain (tf.Tensor)
bias (tf.Tensor)
axis (int)
epsilon (float)
- Return type:
tf.Tensor
- returnn.tf.util.basic.swapaxes(x, axis1, axis2)[source]¶
Also see
move_axis()
ordimshuffle()
.- Parameters:
x (tf.Tensor)
axis1 (tf.Tensor|int)
axis2 (tf.Tensor|int)
- Returns:
tensor with swapped axes, like numpy.swapaxes
- Return type:
tf.Tensor
- returnn.tf.util.basic.move_axis(x, old_axis, new_axis, name='move_axis')[source]¶
Also see
swapaxes()
ordimshuffle()
.- Parameters:
x (tf.Tensor)
old_axis (int) – can also be negative
new_axis (int) – can also be negative
name (str) – name of the scope
- class returnn.tf.util.basic.TensorCachedComputation(x, key)[source]¶
Helper to cache some computation inside a
tf.Tensor
object. Or also inside any other object.- Parameters:
x (tf.Tensor|object)
key (str|tuple[str|int|tf.Tensor])
- has_cache()[source]¶
- Returns:
whether we have stored the value already. if True, you can use
get_cache()
- Return type:
bool
- returnn.tf.util.basic.sequence_mask(lengths, name=None, **kwargs)[source]¶
Wraps around tf.sequence_mask(). It will cache the value inside the passed object so that we don’t recompute it multiple times.
- Parameters:
lengths (tf.Tensor) – shape (batch,)
name (str|None)
kwargs – passed on to tf.sequence_mask
- Returns:
tensor mask of shape (batch,maxlen/time). default dtype is bool unless you specify something else
- Return type:
tf.Tensor
- returnn.tf.util.basic.sequence_mask_time_major(lengths, **kwargs)[source]¶
Wraps around tf.transpose(tf.sequence_mask(), (1,0)). It will cache the value inside the passed object so that we don’t recompute it multiple times.
- Parameters:
lengths (tf.Tensor) – shape (batch,)
kwargs – passed on to tf.sequence_mask
- Returns:
mask of shape (maxlen/time,batch)
- Return type:
tf.Tensor
- returnn.tf.util.basic.directed(x, direction)[source]¶
If direction == 1 or direction is None, returns just x. If direction == -1, returns reversed(x).
- Parameters:
x (tf.Tensor)
direction (int|None) – -1 or 1 (or None)
- Return type:
tf.Tensor
- returnn.tf.util.basic.reversed(x)[source]¶
Just returns x[::-1]. It will cache the value inside the passed object so that we don’t recompute it multiple times.
- Parameters:
x (tf.Tensor)
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_flatten_with_seq_len_mask_cache_for_data(x)[source]¶
- Parameters:
x (Data)
- Return type:
- returnn.tf.util.basic.get_flatten_with_seq_len_mask_cache(x, seq_lens, batch_dim_axis, time_dim_axis)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)
seq_lens (tf.Tensor) – shape (batch,) of int32
batch_dim_axis (int) – index of batch_dim in x
time_dim_axis (int) – index of time_dim in x
- Return type:
- returnn.tf.util.basic.flatten_with_seq_len_mask(x, seq_lens, batch_dim_axis=None, time_dim_axis=None, time_major=None)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)
seq_lens (tf.Tensor) – shape (batch,) of int32
batch_dim_axis (int) – index of batch_dim in x
time_dim_axis (int) – index of time_dim in x
time_major (bool) – whether time axis is 0 (redundant, kept for compatibility)
- Returns:
tensor of shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time
- Return type:
tf.Tensor
- returnn.tf.util.basic.flatten_with_seq_len_mask_time_major(x, seq_lens, batch_dim_axis, time_dim_axis)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)
seq_lens (tf.Tensor) – shape (batch,) of int32
batch_dim_axis (int) – index of batch_dim in x
time_dim_axis (int) – index of time_dim in x
- Returns:
tensor of shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time
- Return type:
tf.Tensor
- returnn.tf.util.basic.unflatten_with_seq_len_mask(x, seq_lens, batch_major=True)[source]¶
Basically inverse of
flatten_with_seq_len_mask()
andflatten_with_seq_len_mask_time_major()
:param tf.Tensor x: shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time :param tf.Tensor seq_lens: shape (batch,) of int32 :param bool batch_major: if True, the output will be batch major :return: tensor of shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…) :rtype: tf.Tensor
- returnn.tf.util.basic.expand_dims_unbroadcast(x, axis, dim, name='expand_dims_unbroadcast')[source]¶
- Parameters:
x (tf.Tensor|float|int)
axis (int|tf.Tensor) – new axis
dim (int|tf.Tensor) – dimension for axis
name (str) – scope name
- Returns:
if x is of shape (a,b,c) and axis=0, then we return (dim,a,b,c)
- Return type:
tf.Tensor
- returnn.tf.util.basic.expand_multiple_dims(x, axes, name='expand_multiple_dims')[source]¶
- Parameters:
x (tf.Tensor)
axes (list[int]|tuple[int]) – after completion, tf.shape(y)[axis] == 1 for axis in axes
name (str) – scope name
- Returns:
y where we have a new broadcast axis for each axis in axes
- Return type:
tf.Tensor
- returnn.tf.util.basic.tile_transposed(x, axis, multiples)[source]¶
Example: x with shape (D,), tf.tile(x, [N]) can be reshaped into (N,D), while tile_transposed(x, axis=0, multiples=N) can be reshaped into (D,N).
- Parameters:
x (tf.Tensor)
axis (int)
multiples (int|tf.Tensor)
- Returns:
tensor with shape[axis] == x.shape[axis] * multiples
- Return type:
tf.Tensor
- returnn.tf.util.basic.constant_with_shape(x, shape, dtype=None, name='constant_with_shape')[source]¶
- Parameters:
x (tf.Tensor|float|int|bool) – scalar
shape (list[tf.Tensor|int]|tuple[tf.Tensor|int]|tf.Tensor)
dtype (tf.DType)
name (str)
- Returns:
x of the specified shape
- Return type:
tf.Tensor
- returnn.tf.util.basic.dimshuffle(x, axes, name='dimshuffle')[source]¶
Like Theanos dimshuffle. Combines tf.transpose, tf.expand_dims and tf.squeeze.
- Parameters:
x (tf.Tensor)
axes (list[int|str]|tuple[int|str])
name (str) – scope name
- Return type:
tf.Tensor
- returnn.tf.util.basic.sparse_labels_with_seq_lens(x, seq_lens, dtype=tf.int32, collapse_repeated=False, post_filter_idx=None)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,time) -> index, some int type
seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32
collapse_repeated (bool) – like uniq() behavior
post_filter_idx (int|list[int]|set[int]|None) – if given, after an optional collapse_repeated, will remove all those idx
- Returns:
SparseTensor, e.g. input for tf.nn.ctc_loss(), and seq_lens of shape (batch,)
- Return type:
(tf.SparseTensor, tf.Tensor)
- returnn.tf.util.basic.sparse_labels(x, seq_lens, dtype=tf.int32, collapse_repeated=False)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,time) -> index, some int type
seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32
collapse_repeated (bool) – like uniq() behavior
- Returns:
SparseTensor, e.g. input for tf.nn.ctc_loss()
- Return type:
tf.SparseTensor
- returnn.tf.util.basic.uniq(x)[source]¶
- Parameters:
x (tf.Tensor) – 1D shape (time,) -> index, some int type
- Returns:
like numpy.uniq. unlike tf.unique which will never repeat entries.
Example: uniq([0, 0, 1, 1, 0, 0]) == [0, 1, 0], tf.unique([0, 0, 1, 1, 0, 0]) == [0, 1]. For a batched variant, see batched_uniq, or sparse_labels() with option collapse_repeated.
- returnn.tf.util.basic.batched_uniq(x, seq_lens)[source]¶
- Parameters:
x (tf.Tensor) – shape (batch,time) -> index, some int type
seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
- Returns:
tuple (z, new_seq_lens), where z is of shape (batch, max_new_time), max_new_time = max(new_seq_lens), seq_lens is of shape (batch,).
- Return type:
(tf.Tensor, tf.Tensor)
- returnn.tf.util.basic.ctc_greedy_decode(logits, seq_lens, time_major)[source]¶
Similar to
tf.nn.ctc_greedy_decoder()
, but simpler implementation, and should run on GPU.- Parameters:
logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim)
seq_lens (tf.Tensor) – shape (batch,) of int32|int64
time_major (bool)
- Return type:
tf.SparseTensor
- Returns:
in batch-major, [batch,max_time] (like
tf.nn.ctc_greedy_decoder()
)
- returnn.tf.util.basic.get_common_shape(values, ignore_axes=(), allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>)[source]¶
Related:
tf.broadcast_dynamic_shape()
. Also seeunbroadcast_to_common_shape()
.- Parameters:
values (list[tf.Tensor|tf.Variable|float|int]) – all must have the same ndim
ignore_axes (list[int]|tuple[int]) – these axes will be ignored (returned dim will be None)
allow_broadcast_all_sources (bool|NotSpecified)
- Returns:
common shape of all values. broadcasts dims with 1. will use static dims when possible. Dim of axes which are in ignore_axes will be None.
- Return type:
list[tf.Tensor|int|None]
- returnn.tf.util.basic.unbroadcast_to_common_shape(value, common_shape, ignore_axes=(), allow_only_noop=False)[source]¶
- Parameters:
value (tf.Tensor|T)
common_shape (list[tf.Tensor|int|None]) – see
get_common_shape()
ignore_axes (list[int]|tuple[int])
allow_only_noop (bool) – if False, and the unbroadcast is not a no-op, will raise an exception
- Returns:
(maybe) unbroadcasted value
- Return type:
tf.Tensor|T
- returnn.tf.util.basic.concat_with_opt_broadcast(values, allow_broadcast, axis, name='concat_with_opt_broadcast')[source]¶
- Parameters:
values (list[tf.Tensor]) – all with same ndim
allow_broadcast (list[bool]) – same len as values
axis (int)
name (str)
- Returns:
basically tf.concat(values, axis), but we can allow broadcasting for some values
- Return type:
tf.Tensor
- returnn.tf.util.basic.matrix_triangular(shape, dtype=tf.float32, lower=False, upper=False)[source]¶
- Parameters:
shape (tuple[int|tf.Tensor]|tf.Tensor)
dtype (tf.DType)
lower (bool)
upper (bool)
- Return type:
tf.Tensor
- class returnn.tf.util.basic.VariableAssigner(var)[source]¶
Object helper to assign some var. (This is mostly obsolete now.)
- Parameters:
var (tf.Variable)
- returnn.tf.util.basic.get_tf_gcc_version()[source]¶
- Returns:
gcc version, e.g. “4.8.5”
- Return type:
str|None
- returnn.tf.util.basic.get_tf_gcc_path()[source]¶
- Returns:
path to a GCC version which is most suitable for TF (to have correct C++ ABI)
- Return type:
str
- returnn.tf.util.basic.get_tf_gpp_path()[source]¶
- Returns:
path to a G++ version which is most suitable for TF (to have correct C++ ABI)
- Return type:
str
- class returnn.tf.util.basic.CudaEnv[source]¶
Information about the Nvidia CUDA environment, and library. Also path to
nvcc
, the CUDA compiler.
- class returnn.tf.util.basic.OpCodeCompiler(use_cuda_if_available=True, cuda_auto_min_compute_capability=True, include_paths=(), ld_flags=(), c_macro_defines=None, **kwargs)[source]¶
Helper class to compile TF ops on-the-fly, similar to Theano. https://www.tensorflow.org/guide/extend/op https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/extend/adding_an_op.md
- Parameters:
base_name (str) – base name for the module, e.g. “zero_out”
code_version (int|tuple[int]) – check for the cache whether to reuse
code (str) – the source code itself
is_cpp (bool) – if False, C is assumed
c_macro_defines (dict[str,str|int|None]|None) – e.g. {“TENSORFLOW”: 1}
ld_flags (list[str]|None) – e.g. [“-lblas”]
include_paths (list[str]|tuple[str])
include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.
static_version_name (str|None) – normally, we use …/base_name/hash as the dir but this would use …/base_name/static_version_name.
should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)
should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.
log_stream (TextIO|None) – file stream for print statements
verbose (bool) – be slightly more verbose
- class returnn.tf.util.basic.TFNativeUtilCompiler(include_paths=(), ld_flags=(), c_macro_defines=None, **kwargs)[source]¶
Helper class to compile TF utility functions on-the-fly.
- Parameters:
base_name (str) – base name for the module, e.g. “zero_out”
code_version (int|tuple[int]) – check for the cache whether to reuse
code (str) – the source code itself
is_cpp (bool) – if False, C is assumed
c_macro_defines (dict[str,str|int|None]|None) – e.g. {“TENSORFLOW”: 1}
ld_flags (list[str]|None) – e.g. [“-lblas”]
include_paths (list[str]|tuple[str])
include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.
static_version_name (str|None) – normally, we use …/base_name/hash as the dir but this would use …/base_name/static_version_name.
should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)
should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.
log_stream (TextIO|None) – file stream for print statements
verbose (bool) – be slightly more verbose
- returnn.tf.util.basic.make_var_tuple(v)[source]¶
- Parameters:
v (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor])
- Returns:
tuple of tensors
- Return type:
tuple[tf.Tensor]
- returnn.tf.util.basic.add_scaled_noise_to_gradients(grads_and_vars, gradient_noise_scale, sparse_grads=False)[source]¶
Adds scaled noise from a 0-mean normal distribution to gradients. Adapted from tf.contrib.layers.optimizers.
- Parameters:
grads_and_vars (list[(tf.Tensor|tf.IndexedSlices, tf.Variable)])
gradient_noise_scale (float) – used as stddev for tf.truncated_normal().
sparse_grads (bool) – for sparse gradients (tf.IndexedSlices), it will only add the noise to the indexed values. Seems broken in some cases? Needs debugging.
- Returns:
adapted grads_and_vars
- Return type:
list[(tf.Tensor|tf.IndexedSlices, tf.Variable)]
- class returnn.tf.util.basic.CustomGradient[source]¶
Utility functions to specify a custom gradient for a given function, which will be wrapped around via TF
Defun()
.Also see
FlipGradientBuilder
.- register(input_types, op, grad_op, name=None)[source]¶
- Parameters:
input_types (list[tf.DType]|tuple[tf.DType])
op (((tf.Tensor) -> tf.Tensor)|T)
grad_op ((tf.Operation, tf.Tensor) -> tuple[tf.Tensor]|tf.Tensor) – args are (op, out_grad) and it must return in_grad
name (str) – optional func_name
- Returns:
op
- Return type:
((tf.Tensor) -> tf.Tensor)|T
- register_generic_loss_and_error_signal()[source]¶
If you want to use
generic_loss_and_error_signal()
at some point, call this as early as possible, because of https://github.com/tensorflow/tensorflow/issues/6804.
- class returnn.tf.util.basic.MetaLosses[source]¶
This provides a way to use an alternative gradient, or to use the original gradient (error signal) and do something with it. You can then define an additional (meta) loss using this.
This implements synthetic gradients, see
synthetic_gradient()
.- class LossInfo(value, scale, norm_factor, name, source)[source]¶
Covers loss and other info.
- Parameters:
value (tf.Tensor)
scale (float)
norm_factor (tf.Tensor)
name (str)
source (object) – e.g. layer
- class Scope[source]¶
Defines the scope for a synthetic gradient. Create this object via
MetaLosses.enter_gradient_scope()
. Any meta-losses will be collected here viaregister_loss()
.- register_loss(loss)[source]¶
- Parameters:
loss (MetaLosses.LossInfo)
- classmethod synthetic_gradient(x, synthetic_grad_x, loss_scale=1.0, loss_name=None, loss_source=None)[source]¶
Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343
- Parameters:
x (tf.Tensor)
synthetic_grad_x (tf.Tensor)
loss_scale (float)
loss_name (str|None)
loss_source (object|None)
- Returns:
x, where the gradient is overwritten by synthetic_grad_x, and when calculated, the gradient prediction loss will be added to
cls.scope
.- Return type:
tf.Tensor
- classmethod tikhonov_regularized(x, dummy, loss_scale=1.0, loss_name=None, loss_source=None)[source]¶
- Parameters:
x (tf.Tensor)
dummy (tf.Tensor|tf.Variable) – scalar. can be used to enforce getting a gradient
loss_scale (float)
loss_name (str|None)
loss_source (object|None)
- Returns:
identity(x), where we add a Tikhonov regularization
- Return type:
tf.Tensor
- returnn.tf.util.basic.filter_grad(x, threshold, axis)[source]¶
- Parameters:
x (tf.Tensor)
threshold (float) – all grads going through x which max(grad**2) is over the threshold are removed
axis (int|list[int]) – max(grad**2) will be reduced over this axis
- Returns:
identity(x) with custom gradient
- Return type:
tf.Tensor
- returnn.tf.util.basic.debug_register_better_repr()[source]¶
Some types don’t have good __repr__ implementations by default (for the current TF version). For debugging, it can be helpful to give some more info. This monkey-patches clazz.__repr__ of some TF classes.
- returnn.tf.util.basic.cond(pred, true_fn, false_fn, name=None)[source]¶
This is a wrapper around tf.cond(). This will be a branched execution, i.e. either fn1() or fn2() will be executed, or at least the resulting graph will be evaluated. If pred can is constant at the call, only the corresponding fn will be called. This is similar to the TF internal _smart_cond(). And similar to tf.contrib.framework.smart_cond.
- Parameters:
pred (tf.Tensor|bool)
true_fn (()->(tf.Tensor|list[tf.Tensor]|T))
false_fn (()->(tf.Tensor|list[tf.Tensor]|T))
name (str)
- Returns:
fn1() if pred else fn2()
- Return type:
tf.Tensor|list[tf.Tensor]|T
- returnn.tf.util.basic.single_strided_slice(x, axis, begin=None, end=None, step=None)[source]¶
- Parameters:
x (tf.Tensor)
axis (int|tf.Tensor)
begin (int|tf.Tensor|None)
end (int|tf.Tensor|None)
step (int|tf.Tensor|None)
- Returns:
e.g. if axis == 0, returns x[begin:end:step], if axis == 1, returns x[:, begin:end:step], etc.
- Return type:
tf.Tensor
- returnn.tf.util.basic.circular_pad(x, paddings, axes=None)[source]¶
- Parameters:
x (tf.Tensor) – shape (…, height, width)
paddings (int|((int,int), (int,int))|tf.Tensor) – how much to add ((top,bottom),(left,right))
axes (None|tf.Tensor|(tf.Tensor|int,tf.Tensor|int))
- Returns:
tensor with shape (…, top + height + bottom, left + width + right)
- Return type:
tf.Tensor
- returnn.tf.util.basic.spatial_smoothing_energy(x, dim, use_circular_conv=True)[source]¶
- Parameters:
x (tf.Tensor) – shape (…, dim)
dim (int) – last dimension of x
use_circular_conv (bool) – whether to use circular convolution, via circular_pad
- Return type:
tf.Tensor
- Returns:
energy of shape (…)
Via: Achieving Human Parity in Conversational Speech Recognition, Microsoft, 2017 (https://arxiv.org/abs/1610.05256). Interpret the last dimension as 2D (w, h) and apply some high-pass filter on it.
- returnn.tf.util.basic.nan_to_num(x, nan_num=0, inf_num=1e+30)[source]¶
Like numpy.nan_to_num().
- Parameters:
x (tf.Tensor|tf.IndexedSlices)
nan_num (float|tf.Tensor)
inf_num (float|tf.Tensor)
- Returns:
x with replaced nan and inf
- returnn.tf.util.basic.where_bc(condition, x, y, allow_broadcast_all_sources=<class 'returnn.util.basic.NotSpecified'>, name='where_bc')[source]¶
This is basically
tf.where()
but with additional broadcasting support. We explicitly require that the ndims match (or x, y can also be scalars). See alsoget_common_shape()
andunbroadcast_to_common_shape()
.https://github.com/tensorflow/tensorflow/issues/3945 https://github.com/tensorflow/tensorflow/issues/9284
- Parameters:
condition (tf.Tensor)
x (tf.Tensor|float|int)
y (tf.Tensor|float|int)
allow_broadcast_all_sources (bool|NotSpecified)
name (str)
- Returns:
basically tf.where(condition, x, y)
- Return type:
tf.Tensor
- returnn.tf.util.basic.identity_op_nested(x, name='identity')[source]¶
- Parameters:
x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor])
name (str)
:rtype tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]
- returnn.tf.util.basic.nd_indices(indices, batch_axis=0, indices_batch_major=None)[source]¶
- Parameters:
indices (tf.Tensor) – e.g. (batch, …) -> index (or (…, batch, …) -> index)
batch_axis (int) – of the indices tensor itself
indices_batch_major (bool|None) – of the resulting 2-tuple, whether it represents (batch_idx, index) or (index, batch_idx). default is like batch_axis
- Returns:
extended indices with batch-idx which can be used for tf.gather_nd, i.e. in the example of shape (batch, …, 2) where the 2-tuple represents (batch_idx, index) or (index, batch_idx). the shape[:-1] is exactly the same as the indices shape.
- Return type:
tf.Tensor
- returnn.tf.util.basic.stop_all_event_writer_threads()[source]¶
Iterates through all running threads, and stops those which are TF event logger threads. See
stop_event_writer_thread()
.
- returnn.tf.util.basic.stop_event_writer_thread(event_writer)[source]¶
There is a bug in TensorFlow (at least 1.1.0) (https://github.com/tensorflow/tensorflow/issues/4820) that the event writer thread is never stopped. This will try to stop it. Only do it if you don’t use the event writer anymore.
- Parameters:
event_writer (tf.compat.v1.summary.FileWriter|tensorflow.python.summary.writer.event_file_writer.EventFileWriter|tensorflow.python.summary.writer.event_file_writer._EventLoggerThread) – # nopep8
- returnn.tf.util.basic.optional_add(*args)[source]¶
- Parameters:
args (list[tf.Tensor|None]|int|float|tf.Tensor)
- Return type:
tf.Tensor|int|float|None
- Returns:
sums all non-None values, or returns None if there are none
- returnn.tf.util.basic.optional_mul(*args)[source]¶
- Parameters:
args (tf.Tensor|None|int|float)
- Return type:
tf.Tensor|int|float|None
- Returns:
multiplies all non-None values, or returns None if there are none
- returnn.tf.util.basic.opt_logical_and(*args)[source]¶
- Parameters:
args (tf.Tensor|bool)
- Returns:
basically logical_and(*args), but leaves out all constants
- Return type:
tf.Tensor|bool
- returnn.tf.util.basic.opt_logical_or(*args)[source]¶
- Parameters:
args (tf.Tensor|bool)
- Returns:
basically logical_or(*args), but leaves out all constants
- Return type:
tf.Tensor|bool
- returnn.tf.util.basic.windowed_nd(source, window_size, window_left=None, window_right=None, padding='same', time_axis=1, new_window_axis=2, stride=1)[source]¶
Constructs a new “window” axis which is a moving input over the time-axis. If you want to take out a single window, i.e. a slice, see
slice_nd()
.The windowing logic behaves just as in convolution or pooling.
There are multiple implementations:
By tiling + padding and then reshaping, we can get what we want. This is the “clever” implementation which is efficient but difficult to understand. To really understand it, it’s best to visualize it. This is the default implementation. It is only efficient with no striding (stride=1), so we only use it for that case.
We can do with
tf.gather()
by calculating the exact indices in the input tensor for all windows. This is quite straight-forward and still reasonably efficient. We use this for striding.tf.image.extract_patches()
is quite similar in behavior.We also have native implementations for chunk and unchunk, which are also similar in behavior.
PyTorch unfold is also similar in behavior.
- Parameters:
source (tf.Tensor) – N-D tensor of shape (…, n_time, …)
window_size (int|tf.Tensor) – window size
window_left (int|tf.Tensor|None)
window_right (int|tf.Tensor|None)
padding (str) – “same” or “valid”
time_axis (int)
new_window_axis (int)
stride (int) – return only each Nth windwow
- Returns:
tensor of shape (…, n_time, …, window, …)
- Return type:
tf.Tensor
- returnn.tf.util.basic.slice_nd(x, start, size)[source]¶
- Parameters:
x (tf.Tensor) – shape (B, T, …)
start (tf.Tensor) – shape (B,), int32
size (int|tf.Tensor) – scalar
- Returns:
[x[start_1:size], x[start_2:size], …, x[start_B:size]], shape (B, size, …) Like
slice_pad_zeros()
, the size in the first axis will always besize
, and we will pad with zeros.- Return type:
tf.Tensor
- returnn.tf.util.basic.global_tensor(f, name)[source]¶
This creates a global accessible tensor in the graph to be reused later, i.e. on the second call given a unique name, it will not create a new tensor but return the previously created tensor. This is for the current graph, i.e. if there is a new graph, it will recreate the tensor.
- Parameters:
f (() -> tf.Tensor) – callable which creates the tensor
name (str) – global reference name for the tensor. should be a valid scope name
- Returns:
the tensor
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_global_train_flag_placeholder()[source]¶
Also consider
TFNetwork.get_current_network().train_flag()
, orget_global_train_flag()
.- Returns:
bool scalar tensor
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_global_train_flag()[source]¶
- Return type:
tf.Tensor|bool
- Returns:
global train flag
- returnn.tf.util.basic.cond_on_train_flag(fn_train, fn_eval)[source]¶
Uses fn_train() or fn_eval() base on train_flag. It will be a branched evaluation. train_flag is determined via
get_global_train_flag()
.- Parameters:
fn_train (()->tf.Tensor)
fn_eval (()->tf.Tensor)
- Returns:
fn_train() if self.train_flag else fn_eval()
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_global_random_generator(*, create: bool = True) Generator | None [source]¶
- Parameters:
create – if True and no generator exists yet, it will create one
- Returns:
random generator
- class returnn.tf.util.basic.StatelessRandomSeed(_shape: Tensor | Sequence[int | Tensor], _key: Tensor, _counter: Tensor, _algorithm: int | Tensor)[source]¶
State to create some random numbers.
The random numbers can be created multiple times, and it will always return the same value for the same instance. This is useful to save memory.
- classmethod create(*, shape: Tensor | Sequence[int | Tensor], generator: Generator | None = None) StatelessRandomSeed [source]¶
- Parameters:
shape
generator
- Returns:
new instance
- uniform(*, minval: float | Tensor = 0, maxval: float | Tensor | None = None, dtype: DType = tf.float32) Tensor [source]¶
Basically copy of tf.random.Generator.uniform.
- Parameters:
minval
maxval
dtype
- Returns:
random tensor with given shape. Note that this op is deterministic, i.e. it will always return the same value for multiple calls on the same instance, as the instance encapsulates all random state.
- normal(mean: float | Tensor = 0.0, stddev: float | Tensor = 1.0, dtype: DType = tf.float32) Tensor [source]¶
Basically copy of tf.random.Generator.normal.
- Parameters:
mean
stddev
dtype
- Returns:
random tensor with given shape. Note that this op is deterministic, i.e. it will always return the same value for multiple calls on the same instance, as the instance encapsulates all random state.
- returnn.tf.util.basic.encode_raw(x, axis=-1, seq_lens=None)[source]¶
The inverse function of tf.compat.v1.decode_raw(). Also see: https://stackoverflow.com/questions/43403147/how-to-create-a-encode-raw-tensorflow-function
- Parameters:
x (tf.Tensor) – of integer types [0,255], will get casted to uint8
axis (int) – the axis to reduce-join the string. decode_raw has added it at the end
seq_lens (tf.Tensor|None) – must have same shape as x after reduce-joining. Note that using seq_lens will make our output not compatible with tf.compat.v1.decode_raw() anymore because tf.compat.v1.decode_raw() requires all strings to be of the same length.
- Returns:
string tensor
- Return type:
tf.Tensor
The vocab is shared across the current instance of the computation graph. The tensor name might be different in different runs.
- Parameters:
vocab_strings (list[str])
- Returns:
shape (len(vocab_strings),), tf.string
- Return type:
tf.Tensor
- returnn.tf.util.basic.map_labels(x, label_map, name='map_labels')[source]¶
- Parameters:
x (tf.Tensor|tf.SparseTensor) – values of integer types
label_map (dict[int,int|None]) – should be dense on input
name (str)
- Returns:
mapped values
- Return type:
tf.Tensor|tf.SparseTensor
- returnn.tf.util.basic.remove_labels(x, labels)[source]¶
- Parameters:
x (tf.SparseTensor) – sequences, i.e. the indices are interpret as (batch,time)
labels (set[int]|list[int])
- Returns:
x where all provided labels are removed, and the indices are changed accordingly
- Return type:
tf.SparseTensor
- returnn.tf.util.basic.pad_zeros_in_axis(x, before=0, after=0, axis=0)[source]¶
- Parameters:
x (tf.Tensor)
before (int|tf.Tensor)
after (int|tf.Tensor)
axis (int)
- Returns:
- returnn.tf.util.basic.slice_pad_zeros(x, begin, end, axis=0)[source]¶
- Parameters:
x (tf.Tensor) – of shape (…, time, …)
begin (int|tf.Tensor)
end (int|tf.Tensor)
axis (int)
- Returns:
basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.
- Return type:
tf.Tensor
- returnn.tf.util.basic.post_control_dependencies(x, updates)[source]¶
- Parameters:
x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor])
updates (list[tf.Operation])
- Returns:
identity(x) with control_dependencies(updates)
- Return type:
tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]
- returnn.tf.util.basic.sequential_control_dependencies(ls)[source]¶
tf.control_dependencies but each operation will be created such that it is executed after the ones coming before in the list, i.e. l[0] is executed first, l[-1] is executed last.
- Parameters:
ls (list[()->(tf.Operation|tf.Tensor)])
- returnn.tf.util.basic.global_queue(name, queue_type, capacity, dtypes, shapes=None, names=None)[source]¶
- Parameters:
name (str) – global name
queue_type ((...)->tf.QueueBase) – some function which creates a queue
capacity
dtypes (list[tf.DType|str])
shapes (list[tf.TensorShape|tuple[int|None]]|None)
names (list[str]|None)
- Return type:
tf.QueueBase
- returnn.tf.util.basic.init_variable_if_needed(v)[source]¶
- Parameters:
v (tf.Variable)
- Return type:
tf.Operation
- returnn.tf.util.basic.auto_init_var(v)[source]¶
- Parameters:
v (tf.Variable)
- Returns:
a reference to the var via tf.identity
- Return type:
tf.Tensor
- returnn.tf.util.basic.true_once()[source]¶
- Returns:
tensor which will be True once and then always False Internally, this creates a non-trainable variable as a helper.
- Return type:
tf.Tensor
- returnn.tf.util.basic.raise_OutOfRangeError()[source]¶
- Returns:
an op which raises an OutOfRangeError
- Return type:
tf.Operation
- returnn.tf.util.basic.enforce_copy(x)[source]¶
- Parameters:
x (tf.Tensor|tf.Variable)
- Returns:
copy of input, i.e. enforces that this is not a ref
- Return type:
tf.Tensor
- returnn.tf.util.basic.zeros_dyn_shape(shape, dtype=tf.float32, name='zeros_dyn_shape')[source]¶
- Parameters:
shape (list[int|None]|tuple[int|None])
dtype (str|tf.DType)
name (str)
- Returns:
zeros = tf.zeros() which has 1 at the None dims, however, this is a dynamic size, so zeros.shape.as_list() returns exactly shape, including the None’s.
- Return type:
tf.Tensor
- returnn.tf.util.basic.view_as(x, dtype)[source]¶
Does the numpy.view equivalent. Note that the current implementation is inefficient (uses tf.compat.v1.py_func) and CPU-only. Also see
tf.bitcast()
.- Parameters:
x (tf.Tensor)
dtype (tf.DType)
- Returns:
x.view(dtype) equivalent (see numpy.view)
- returnn.tf.util.basic.broadcast_gradient_args(shape_x, shape_y)[source]¶
- Parameters:
shape_x (tf.Tensor)
shape_y (tf.Tensor)
- Returns:
(axis reduce arg for grad x, axis reduce arg for grad y)
- Return type:
(tf.Tensor, tf.Tensor)
- returnn.tf.util.basic.maximum_with_identity_grad(x, y)[source]¶
- Parameters:
x (tf.Tensor)
y (tf.Tensor|float)
- Returns:
tf.maximum(x, y) where each will receive the gradient
- Return type:
tf.Tensor
- returnn.tf.util.basic.minimum_with_identity_grad(x, y)[source]¶
- Parameters:
x (tf.Tensor)
y (tf.Tensor|float)
- Returns:
tf.minimum(x, y) where each will receive the gradient
- Return type:
tf.Tensor
- returnn.tf.util.basic.clip_by_value_with_identity_grad(x, clip_value_min, clip_value_max)[source]¶
- Parameters:
x (tf.Tensor)
clip_value_min (tf.Tensor|float)
clip_value_max (tf.Tensor|float)
- Returns:
tf.clip_by_value(x, clip_value_min, clip_value_max) where each will receive the gradient
- Return type:
tf.Tensor
- returnn.tf.util.basic.safe_log(x, eps=1e-20, use_fake_grad=True)[source]¶
Safe wrapper around
tf.log()
which avoids infs or nans in the gradient.- Parameters:
x (tf.Tensor)
eps (float|tf.Tensor)
use_fake_grad (bool) – True -> use maximum_with_identity_grad, False -> use tf.maximum
- Returns:
log(max(x, eps))
- Return type:
tf.Tensor
- returnn.tf.util.basic.safe_exp(x, eps=1e-20)[source]¶
- Parameters:
x (tf.Tensor)
eps (float)
- Returns:
exp(x), but does clipping before, such that it never returns inf nor exactly 0.0. Also, we make sure that we use the gradient in all cases.
- Return type:
tf.Tensor
- returnn.tf.util.basic.l1_normalized(x, axis=-1, eps=1e-20, use_logsumexp=False, is_not_negative=False)[source]¶
- Parameters:
x (tf.Tensor) – assumes != 0
axis (int|tf.Tensor) – in range [-rank(x),rank(x)]
eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps
use_logsumexp (bool) – eps must not be None
is_not_negative (bool)
- Returns:
y such that tf.reduce_sum(tf.abs(y)) == 1. i.e. y = x / tf.reduce_sum(tf.abs(x)).
- Return type:
tf.Tensor
- returnn.tf.util.basic.lin_exp(x, use_safe_exp=True)[source]¶
- Parameters:
x (tf.Tensor)
use_safe_exp (bool)
- Returns:
x + 1 if x >= 0 else exp(x). this is smooth and differentiable everywhere
- Return type:
tf.Tensor
- returnn.tf.util.basic.lin_exp_normed(x, axis=-1, eps=1e-10)[source]¶
This can be used as an alternative to softmax. It uses
lin_exp()
instead of exp.- Parameters:
x (tf.Tensor)
axis (int|tf.Tensor) – in range [-rank(x),rank(x)]
eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps
- Returns:
y = l1_normalized(lin_exp(x)), i.e. tf.reduce_sum(y) == 1, and y >= 0.
- Return type:
tf.Tensor
- returnn.tf.util.basic.check_base_op_type_and_replace(x, op_type, new_op_type)[source]¶
Suppose you have
x = tf.nn.softmax(z)
and you want to gety = tf.nn.log_softmax(z)
. This function will test to see ifx
is of that kind and then returny
.- Parameters:
x (tf.Tensor)
op_type (str) – e.g. “Softmax”
new_op_type (str) – e.g. “LogSoftmax”
- Returns:
x with new_op_type instead of op_type, or None if not matched
- Return type:
tf.Tensor|None
- returnn.tf.util.basic.copy_op(op: Operation, *, graph: Graph | None = None, op_type: str | None = None, inputs: Sequence[Tensor] | None = None, name: str | None = None) Operation [source]¶
Copies a tf.Operation.
- Parameters:
op
graph – if given, overwrites op.graph, otherwise uses the same op.graph
op_type – if given, overwrites op.type, otherwise uses the same op.type
inputs – if given, overwrites op.inputs, otherwise uses the same op.inputs
name
- Returns:
copy of op but optionally change op.type == op_type or op.inputs == inputs
- returnn.tf.util.basic.simplify_neg(a)[source]¶
- Parameters:
a (T|tf.Tensor|int|float|numpy.ndarray|numpy.number)
- Returns:
-b. but the operation is potentially simplified
- Return type:
T|tf.Tensor|int|float|numpy.ndarray|numpy.number
- returnn.tf.util.basic.simplify_add(a, b)[source]¶
- Parameters:
a (T|tf.Tensor|int|float|numpy.ndarray|numpy.number)
b (T|tf.Tensor|int|float|numpy.ndarray|numpy.number)
- Returns:
a - b. but the operation is potentially simplified
- Return type:
T|tf.Tensor|int|float|numpy.ndarray|numpy.number
Obviously, it is not possible to perform simplification in all cases. So this never can be complete. This just covers some very simple cases, e.g:
(a + b) + (-b) == a
- returnn.tf.util.basic.simplify_sub(a, b)[source]¶
- Parameters:
a (T|tf.Tensor|int|float|numpy.ndarray)
b (T|tf.Tensor|int|float|numpy.ndarray)
- Returns:
a - b. but the operation is potentially simplified
- Return type:
T|tf.Tensor|int|float|numpy.ndarray
Wraps to
simplify_add()
- returnn.tf.util.basic.simplify_non_negative_seq_length(x)[source]¶
- Parameters:
x (tf.Tensor|int|float|numpy.ndarray)
- Returns:
max(x, 0), or simplified if possible
- Return type:
tf.Tensor|int|float|numpy.ndarray
- returnn.tf.util.basic.copy_tensor(x)[source]¶
Similar to tf.identity, but we ensure here that the return value has its own memory. This can be relevant when you want to keep a copy of the original variable value. See
get_variable_value_copy_before_update_ops()
for usage.- Parameters:
x (tf.Tensor)
- Returns:
a copy of x (points to new memory)
- Return type:
tf.Tensor
- returnn.tf.util.basic.smoothing_cross_entropy(logits, labels, label_smoothing, gaussian=False, vocab_size=None, logits_are_normalized=False)[source]¶
Cross entropy with label smoothing to limit over-confidence. Code adapted from here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_layers.py
- Parameters:
logits (tf.Tensor) – Tensor of size shape(labels) + [vocab_size]
labels (tf.Tensor) – Tensor of size […]
vocab_size (int|tf.Tensor) – Tensor representing the size of the vocabulary.
label_smoothing (float) –
confidence = 1.0 - label_smoothing. Used to determine on and off values for label smoothing. If gaussian is true, confidence is the variance to the gaussian distribution. A common default value is 0.1. See:
gaussian (bool) – Uses a gaussian distribution for label smoothing
logits_are_normalized (bool)
- Returns:
Tensor of the same shape as labels and of the same dtype as logits.
- Return type:
tf.Tensor
- returnn.tf.util.basic.softmax_cross_entropy_over_size(logits, labels, stable_gradient=True)[source]¶
The last spatial axis with dyn size info will be used and interpret as the class probabilities over the size. We will mask logits outside of the size. We expect that the labels have the corresponding invalid frames already set to 0.0. This can be used to measure the cross entropy between two soft alignments / attention weights.
- Parameters:
logits (Data) – in log space, unscaled. shape (…,T,…). Shape can be eg. (B,dec-T,enc-T,H…), or (dec-T,enc-T,B,H…), etc. If it has multiple axes with dynamic size, we use the last one (enc-T in the example).
labels (Data) – in prob space. shape compatible to logits (but axes can be ordered differently). Shape can be e.g. (B,dec-T,enc-T,H…) etc. If is has multiple spatial axes, we expect them to be in the same order as of logits
stable_gradient (bool) – whether to use an explicit gradient
- Returns:
shape as logits, but the T axis removed.
- Return type:
tf.Tensor
- returnn.tf.util.basic.interpolate_bilinear(grid, query_points, name='interpolate_bilinear', indexing='ij')[source]¶
Similar to Matlab’s interp2 function. Finds values for query points on a grid using bilinear interpolation. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.
- Parameters:
grid (tf.Tensor) – a 4-D float Tensor of shape [batch, height, width, channels].
query_points (tf.Tensor) – a 3-D float Tensor of N points with shape [batch, N, 2]. Note that this function is not differentiable w.r.t. the query points.
name (str) – a name for the operation (optional).
indexing (str) – whether the query points are specified as row and column (ij), or Cartesian coordinates (xy).
- Returns:
a 3-D Tensor with shape [batch, N, channels]
- Return type:
tf.Tensor
- returnn.tf.util.basic.dense_image_warp(image, flow, name='dense_image_warp')[source]¶
Image warping using per-pixel flow vectors. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.
- Parameters:
image (tf.Tensor) – 4-D float Tensor with shape [batch, height, width, channels].
flow (tf.Tensor) – A 4-D float Tensor with shape [batch, height, width, 2]. E.g. via
create_random_warp_flow_2d()
. Note that this function is not differentiable w.r.t. the flow.name (str) – A name for the operation (optional).
- Returns:
A 4-D float Tensor with shape`[batch, height, width, channels]` and same type as input image.
- Return type:
tf.Tensor
- returnn.tf.util.basic.create_random_warp_flow_2d(shape, std=None, scale=10.0, blur_std=2.0)[source]¶
Can be used with
dense_image_warp()
.- Parameters:
shape (tf.Tensor|(int,int,int)) – 1D, contains (batch,height,width). e.g.
tf.shape(image)[:-1]
std (float|(float,float))
scale (float|(float,float))
blur_std (float|(float,float))
- Returns:
[batch, height, width, 2]
- Return type:
tf.Tensor
- returnn.tf.util.basic.gaussian_kernel_2d(size, std)[source]¶
- Parameters:
size (int|(int,int))
std (float|(float,float))
- Returns:
(size_x*2+1,size_y*2+1), float32
- Return type:
tf.Tensor
- returnn.tf.util.basic.gaussian_blur_2d(image, kernel_size=None, kernel_std=None)[source]¶
- Parameters:
image (tf.Tensor) – (batch,width,height,channel)
kernel_size (int|(int,int)|None)
kernel_std (float|(float,float)|None)
- Returns:
image
- Return type:
tf.Tensor
- returnn.tf.util.basic.bleu_score(hypothesis, truth, hyp_seq_lens, truth_seq_lens)[source]¶
Calculates the BLEU score. See
Util.compute_bleu()
. This currently wraps a Python function and thus is not efficient.- Parameters:
hypothesis (tf.Tensor) – (batch, max(hyp_seq_lens))
truth (tf.Tensor) – (batch, max(truth_seq_lens))
hyp_seq_lens (tf.Tensor) – (batch,)
truth_seq_lens (tf.Tensor) – (batch,)
- Return type:
tf.Tensor
- Returns:
(batch,), float32
- returnn.tf.util.basic.prod(ls)[source]¶
- Parameters:
ls (list[T]|tuple[T]|numpy.ndarray|tf.Tensor)
- Return type:
T|int|float|tf.Tensor
- returnn.tf.util.basic.mem_usage_for_dev(dev_name)[source]¶
- Parameters:
dev_name (str) – e.g. “/device:GPU:0” or “/job:localhost/replica:0/task:0/device:GPU:0”
- Returns:
int scalar, which is the peak memory usage in bytes of the given device
- Return type:
tf.Tensor
This function will not create multiple nodes in the graph for multiple calls. Currently only works for GPU devices.
- returnn.tf.util.basic.identity_with_debug_log(x, args, out, name='DebugLogOp')[source]¶
- Parameters:
x (tf.Tensor)
args (dict[str,tf.Tensor|None])
out (list[dict[str,numpy.ndarray]])
name (str)
- Returns:
x
- Return type:
tf.Tensor
- returnn.tf.util.basic.add_check_numerics_ops(fetches=None, ignore_ops=None, use_check_numerics=True, debug_print_added_checks=True, name='add_check_numerics_ops')[source]¶
This is similar to
tf.add_check_numerics_ops()
and based on similar code. It adds some more logic and options.- Parameters:
fetches (list[tf.Operation|tf.Tensor]|None) – in case this is given, will only look at these and dependent ops
ignore_ops (list[str]) – e.g. “”
use_check_numerics (bool) – if False, instead of
tf.check_numerics()
, it does the check manually (viatf.is_finite()
) and in case there is inf/nan, it will also print the tensor (while tf.check_numerics does not print the tensor). Note that this can be about 50 times slower.debug_print_added_checks (bool) – prints info about each added check
name (str) – op-name for the final tf.group
- Returns:
operation which performs all the checks
- Return type:
tf.Operation
- returnn.tf.util.basic.nested_get_shapes(x)[source]¶
- Parameters:
x (tf.Tensor|dict[str,tf.Tensor]|list[tf.Tensor]|object) – anything that nest supports
- Returns:
same structure as x, but tf.TensorShape for each tensor
- returnn.tf.util.basic.has_control_flow_context(x)[source]¶
- Parameters:
x (tf.Tensor|tf.Operation|int|float|None|list[tf.Tensor|tf.Operation|int|float])
- Returns:
whether x has a control flow, i.e. is e.g. inside a while loop
- Return type:
bool
- returnn.tf.util.basic.same_control_flow_ctx(x)[source]¶
Will use the same (flow) context as x. E.g. if x is a constant, it can be outside the loop, so we will yield a context which is not inside the loop. (This function was earlier called
same_context
.)See also
default_control_flow_ctx()
.
- returnn.tf.util.basic.op_in_right_control_flow_context(op: Operation) Operation | None [source]¶
- Parameters:
op – op with some control flow.
- Returns:
some op in a control flow context which can be accessed from the current control flow context, or None if there is no such op.
- returnn.tf.util.basic.get_protobuf_fields(obj)[source]¶
- Parameters:
obj – protobuf object
- Return type:
dict[str]
- returnn.tf.util.basic.get_op_attrib_keys(op)[source]¶
- Parameters:
op (tf.Operation|tf.Tensor|tf.TensorArray)
- Return type:
list[str]
- Returns:
list of attribs. op.get_attr(key) should work
- returnn.tf.util.basic.get_op_input_names(op)[source]¶
Also see: https://stackoverflow.com/questions/50723310/get-tensorflow-tf-operation-inputs-by-name
- Parameters:
op (tf.Operation)
- Returns:
list of names with same len as op.inputs
- Return type:
list[str]
- returnn.tf.util.basic.get_op_inputs_by_name(op)[source]¶
- Parameters:
op (tf.Operation)
- Returns:
dict input_name -> input
- Return type:
dict[str,tf.Tensor]
- returnn.tf.util.basic.tensor_array_is_dynamic_size(ta)[source]¶
- Parameters:
ta (tf.TensorArray)
- Return type:
bool
- returnn.tf.util.basic.tensor_array_is_clear_after_read(ta)[source]¶
- Parameters:
ta (tf.TensorArray)
- Return type:
bool
- returnn.tf.util.basic.tensor_array_element_shape(ta)[source]¶
- Parameters:
ta (tf.TensorArray)
- Return type:
tf.TensorShape
- returnn.tf.util.basic.tensor_array_like(ta, **kwargs)[source]¶
- Parameters:
ta (tf.TensorArray)
kwargs – passed to tf.TensorArray constructor
- Returns:
another tensor array, just like ta
- Return type:
tf.TensorArray
- returnn.tf.util.basic.tensor_array_stack(ta, start=0, stop=None, name='TensorArrayStack')[source]¶
Extends tf.TensorArray.stack by start/stop options.
- Parameters:
ta (tf.TensorArray)
start (int|tf.Tensor)
stop (int|tf.Tensor|None)
name (str)
- Return type:
tf.Tensor
- returnn.tf.util.basic.beam_search(scores, beam_size, keep_beams=False, cheating_gold_targets=None, cheating_src_beam_idx=None, cheating_exclusive=True)[source]¶
This is mostly a higher-level wrapper around
tf.nn.top_k()
.- Parameters:
scores (tf.Tensor) – (batch,beam_in,dim). combined scores (i.e. base beam scores + new scores), dense over the dims, such that we have labels in [0,…,dim-1]. These are supposed to be in +log space, although it just matters here that we take the maximum (or top-k).
beam_size (int|tf.Tensor)
keep_beams (bool) – specifies that we keep the beam_in entries, i.e. we just expand, i.e. we just search on the dim. beam_size must be a multiple of beam_in.
cheating_gold_targets (tf.Tensor|None) – (batch,), int32
cheating_src_beam_idx (tf.Tensor|None) – (batch,), int32. If not given, assumes beam_in - 1. See code below.
cheating_exclusive (bool) – make sure that the cheating target does not occur twice, i.e. no duplicates in search tree. This could have happened in our earlier implementation, or if this is disabled.
- Return type:
(tf.Tensor,tf.Tensor,tf.Tensor)
- Returns:
src_beams, labels, beam_scores. src_beams: (batch, beam) -> beam_in idx (int32), labels: (batch, beam) -> dim idx (int32), beam_scores: (batch, beam) -> beam score (float32).
- returnn.tf.util.basic.select_src_beams(x, src_beams, name='select_src_beams')[source]¶
- Parameters:
x (tf.Tensor|tf.TensorArray|T) – (batch * src-beam, …)
src_beams (tf.Tensor) – (batch, beam) -> src-beam-idx
name (str)
- Returns:
(batch * beam, …)
- Return type:
tf.Tensor|T
- returnn.tf.util.basic.filter_ended_scores(x, end_flags, batch_dim=None, dim=None, score_zero=0.0, score_rem=-1e+30)[source]¶
This can e.g. used before tf.nn.top_k to let only one beam through for an ended hypothesis. Then, batch would also include the beam size, which does not matter here.
- Parameters:
x (tf.Tensor) – (batch, dim)
end_flags (tf.Tensor) – (batch,)
batch_dim (tf.Tensor|int|None)
dim (tf.Tensor|int|None)
score_zero (float) – x[…, 0] will have this score where end_flag is True
score_rem (float) – x[…, 1:] will have this score where end_flag is False
- Returns:
filtered x, (batch, dim)
- Return type:
tf.Tensor
- returnn.tf.util.basic.to_int32_64(x)[source]¶
- Parameters:
x (tf.Tensor) – dtype uint8, int8, int16, int32, int64
- Return type:
tf.Tensor
- Returns:
dtype int32 or int64
- returnn.tf.util.basic.to_float32(x)[source]¶
- Parameters:
x (tf.Tensor)
- Returns:
x as float32
- Return type:
tf.Tensor
- returnn.tf.util.basic.batch_gather(x, indices, keepdims=False)[source]¶
- Parameters:
x (tf.Tensor) – (batch,dim,…)
indices (tf.Tensor) – (batch,) -> [0..dim-1]
keepdims (bool)
- Returns:
x[batches,indices[batches]], (batch,…). or (batch,1,…) with keep_dims
- Return type:
tf.Tensor
- returnn.tf.util.basic.unflatten_nd(x, nd_sizes, num_axes=None)[source]¶
E.g. assume that for each x[b], we have an image flattened, i.e. of size width*height. Then nd_sizes[b] == (width, height) would provide the individual sizes. We return y such that y[b][i][j] == x[b][i * nd_sizes[b][0] + j]. This is implemented for any number of axes. Kind of like the reverse of a ND version of flatten_with_seq_len_mask.
- Parameters:
x (tf.Tensor) – (B, T, <Ds>)
nd_sizes (tf.Tensor) – (B, N = num_axes)
num_axes (int)
- Returns:
(B, T_1, …, T_N, <Ds>), T_i == max(nd_sizes[:, i])
- Return type:
tf.Tensor
- returnn.tf.util.basic.kernels_registered_for_op(op_name)[source]¶
This just wraps the TF C++ function tensorflow::KernelsRegisteredForOp().
- Parameters:
op_name (str) – e.g. “Gather”
- Returns:
e.g. [“device=’CPU’; …”, “device=’GPU’; …”, …]
- Return type:
list[str]
- returnn.tf.util.basic.supported_devices_for_op(op_name)[source]¶
- Parameters:
op_name (str)
- Returns:
list of devices, e.g. [“CPU”, “GPU”]
- Return type:
list[str]
- returnn.tf.util.basic.find_unsupported_devices_in_graph(graph, dev_name, ignore=None)[source]¶
- Parameters:
graph (tf.Graph)
dev_name (str) – e.g. “GPU”
ignore (list[str]|None) – list of op-names to ignore, e.g. [“ScalarSummary”] etc. If None, will use defaults.
- Return type:
list[tf.Operation]
- returnn.tf.util.basic.get_device_attr(dev)[source]¶
- Parameters:
dev (str) – eg. “/device:GPU:0”, or any argument for
tf.device()
- Returns:
scalar string, eg. b’device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1’
- Return type:
tf.Tensor
- returnn.tf.util.basic.print_graph_output(fetches, file=None, max_depth=None)[source]¶
- Parameters:
fetches (tf.Operation|tf.Tensor|list[tf.Tensor|tf.Operation])
file (IO[str]|io.TextIOBase|io.StringIO|None) – sys.stdout by default
max_depth (int|None)
- returnn.tf.util.basic.format_graph_output(fetches, max_depth=None)[source]¶
- Parameters:
fetches (tf.Operation|tf.Tensor|list[tf.Tensor|tf.Operation])
max_depth (int|None)
- Return type:
str
- returnn.tf.util.basic.var_handle_or_ref(var)[source]¶
- Parameters:
var (tf.Variable|tensorflow.python.ops.resource_variable_ops.ResourceVariable)
- Return type:
tf.Tensor
- returnn.tf.util.basic.find_ops_with_tensor_input(tensors, fetches=None, graph=None)[source]¶
- Parameters:
tensors (tf.Tensor|tf.Variable|list[tf.Tensor])
fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None)
graph (tf.Graph|None)
- Returns:
list of ops
- Return type:
list[tf.Operation]
- returnn.tf.util.basic.find_ops_path_output_to_input(tensors, fetches)[source]¶
Searches backwards like in
extern.graph_editor.get_backward_walk_ops()
and then returns a found traceback, if there is one.- Parameters:
tensors (tf.Tensor|tf.Variable|list[tf.Tensor]) – input
fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]) – output
- Returns:
list of ops, input to output
- Return type:
list[tf.Operation]|None
- returnn.tf.util.basic.get_var_update_ops(var, fetches=None)[source]¶
- Parameters:
var (tf.Variable)
fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None) – e.g. the Optimizer.minimize() op
- Returns:
list of ops that update var; currently expected to be of length 1
- Return type:
list[tf.Operation]
- returnn.tf.util.basic.get_variable_value_copy_before_update_ops(var, update_ops)[source]¶
- Parameters:
var (tf.Variable)
update_ops (list[tf.Operation])
- Returns:
var value before any of the update_ops are executed
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_variable_grad_from_update_ops(var, update_ops)[source]¶
- Parameters:
var (tf.Variable)
update_ops (list[tf.Operation]) – via
get_var_update_ops()
- Returns:
grad of loss w.r.t. var, as it is used in the update_ops, e.g. via ApplyAdam or ApplyGradientDescent (not all kind of updates are supported currently). If the gradient is sparse, it will return a tf.IndexedSlices.
- Return type:
tf.Tensor|tf.IndexedSlices
- returnn.tf.util.basic.get_variable_from_tensor(var)[source]¶
- Parameters:
var (tf.Variable|tf.Tensor)
- Returns:
resolve tf.identity or read ops
- Return type:
tf.Variable|tf.Tensor
- returnn.tf.util.basic.add_control_input(op, control_input)[source]¶
- Parameters:
op (tf.Operation)
control_input (tf.Operation|tf.Tensor)
- returnn.tf.util.basic.vocab_idx_to_vocab_string(labels, vocab)[source]¶
Just does a lookup on vocab.
- Parameters:
labels (tf.Tensor) – (batch,max_len), or any, int32, indices in vocab
vocab (tf.Tensor) – (vocab_size,), string
- Returns:
(batch,max_len), or any, like labels, string
- Return type:
tf.Tensor
- returnn.tf.util.basic.vocab_idx_repr(labels, data)[source]¶
- Parameters:
labels (tf.Tensor) – int32, indices in vocab
data (Data) – might have vocab
- Returns:
string or int32, shape as labels, or maybe without last axis
- Return type:
tf.Tensor
- returnn.tf.util.basic.string_merge(strings, seq_lens, separator=' ')[source]¶
Also see TFEngine.Engine.search().
- Parameters:
strings (tf.Tensor) – (batch,max_len)
seq_lens (tf.Tensor) – (batch,)
separator (str|tf.Tensor) – string
- Returns:
(batch,), string
- Return type:
tf.Tensor
- returnn.tf.util.basic.string_replace(strings, old, new, count=-1)[source]¶
Like str.replace.
- Parameters:
strings (tf.Tensor) – (batch,), string
old (tf.Tensor|str)
new (tf.Tensor|str)
count (tf.Tensor|int)
- Returns:
(batch,), string
- Return type:
tf.Tensor
- returnn.tf.util.basic.bpe_merge(strings)[source]¶
- Parameters:
strings (tf.Tensor) – (batch,), string
- Returns:
(batch,), string. strings after BPE merging
- Return type:
tf.Tensor
- returnn.tf.util.basic.words_split(strings)[source]¶
Basically just tf.string_split with delimiter=” “.
- Parameters:
strings (tf.Tensor) – (batch,), string
- Returns:
sparse tensor of shape (batch,max_len), string
- Return type:
tf.SparseTensor
- returnn.tf.util.basic.get_sparse_tensor_length(x)[source]¶
- Parameters:
x (tf.SparseTensor) – of shape prefix + (max_len,), where prefix can be anything, e.g. prefix=(batch,)
- Returns:
shape prefix, int64
- Return type:
tf.Tensor
- returnn.tf.util.basic.string_words_calc_wer(hyps, refs)[source]¶
Uses
words_split()
on hyps and refs, and then tf.edit_distance with normalize=False.- Parameters:
hyps (tf.Tensor) – (batch,), dtype string
refs (tf.Tensor) – (batch,), dtype string
- Returns:
(WER (batch,) unnormalized, num ref words (batch,))
- Return type:
(tf.Tensor, tf.Tensor)
- returnn.tf.util.basic.py_print(pass_through_value, print_args, message=None, summarize=None, first_n=None, name='py_print', file=None)[source]¶
Like
tf.Print()
, but prints to Python stdout or to file. Also seetf.print()
, which however also does not print to Python stdout.- Parameters:
pass_through_value (tf.Tensor|int|float) – will return tf.identity of this, but with side effect of printing
print_args (list[str|tf.Tensor])
message (str|None) – A string, prefix of the error message.
summarize (int|None) – Only print this many entries of each tensor. If None, then a maximum of 3 elements are printed per input tensor.
first_n (int|None) – Only log first_n number of times. Negative numbers log always; this is the default.
name (str)
file (SupportsWrite[str]|None) – a file-like object (stream); defaults to the current sys.stdout.
- Returns:
tf.identity(pass_through_value) with side effect of printing
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_positional_encoding(num_channels, length=None, position=None, min_timescale=1.0, max_timescale=10000.0)[source]¶
Gets a bunch of sinusoids of different frequencies.
Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.
This allows attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to attention.
The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).
In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to channels / 2. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the channels dimension.
The code is adapted from Tensor2Tensor get_timing_signal_1d (https://github.com/tensorflow/tensor2tensor).
- Parameters:
num_channels (int) – scalar, size of timing embeddings to create. The number of different timescales is equal to channels / 2.
length (tf.Tensor|int|None) – scalar, length of timing signal sequence.
position (tf.Tensor|None) – could be provided directly. int32. Can have any shape, e.g. [length] or [B,len]. If not given, will be tf.range(length), i.e. of shape [length].
min_timescale (float) – a float.
max_timescale (float) – a float.
- Returns:
a Tensor of timing signals of shape position.shape + [num_channels], e.g. [length,num_channels]
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_linear_alignment_out_to_in_indices(input_lens, output_lens, pad_value=0)[source]¶
- Parameters:
input_lens (tf.Tensor|list[int]) – [B]
output_lens (tf.Tensor|list[int]) – [B]
pad_value (int)
- Returns:
[B,outT], mapping to input positions [0..input_len-1]. Examples:
input_len=7, output_len=3, resulting indices [1,3,5].
input_len=3, output_len=3, resulting indices [0,1,2].
input_len=2, output_len=4, resulting indices [0,0,1,1].
- Return type:
tf.Tensor
- returnn.tf.util.basic.get_rnnt_linear_aligned_output(input_lens, targets, target_lens, blank_label_idx, pad_value=0, targets_consume_time=False)[source]¶
RNN-T (https://arxiv.org/abs/1211.3711) has an output length of input_lens + target_lens. Here we create a linear alignment. Examples: (B is blank.)
input_len=4, targets=[a,b,c] (len 3), output=[B,a,B,b,B,c,B] (len 7).
input_len=0, targets=[a,b,c] (len 3), output=[a,b,c] (len 3).
input_len=4, targets=[a] (len 1), output=[B,B,a,B,B] (len 5).
input_len=3, targets=[a,b] (len 2), output=[B,a,B,b,B] (len 5)
- Parameters:
input_lens (tf.Tensor|list[int]) – [B], int32. the input (or encoder) lengths
targets (tf.Tensor|list[list[int]]) – [B,targetT], int32
target_lens (tf.Tensor|list[int]) – [B], int32. the targets length
blank_label_idx (int)
pad_value (int)
targets_consume_time (bool) – In the standard RNN-T, the target labels do not consume a time frame. That is why the RNN-T label output length is input_lens + target_lens. In RNA (https://www.isca-speech.org/archive/Interspeech_2017/abstracts/1705.html), each target label consumes a time frame, thus the label output length is just input_lens.
- Returns:
output [B,outT], output_lens [B]. The output is basically the target filled with blank in between.
- Return type:
(tf.Tensor,tf.Tensor)
- returnn.tf.util.basic.get_non_deterministic_ops_from_graph()[source]¶
Lists all non deterministic ops used in the default graph If a non deterministic op is used multiple times each instance will be listed
currently doesn’t check if user specified a specific computation device list of non deterministic ops is not jet complete
- Returns:
list of all non deterministic ops names (depending on device and tf version) used in current graph
- Return type:
list[tf.Operation]
- returnn.tf.util.basic.compute_sampled_logits(weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, subtract_log_q=True, remove_accidental_hits=False, partition_strategy='mod', name='compute_sampled_logits', seed=None)[source]¶
Helper function for nce_loss and sampled_softmax_loss functions. Computes sampled output training logits and labels suitable for implementing e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see sampled_softmax_loss). Note: In the case where num_true > 1, we assign to each target class the target probability 1 / num_true so that the target probabilities sum to 1 per-example.
- This is a copy of
- Parameters:
weights (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor]) – A Tensor of shape [num_classes, dim], or a list of Tensor objects whose concatenation along dimension 0 has shape [num_classes, dim]. The class embeddings.
biases (tf.Tensor) – A Tensor of shape [num_classes]. The class biases.
labels (tf.Tensor) – A Tensor of type int64 and shape [batch_size, num_true]. The target classes. Note that this format differs from the labels argument of tf.nn.softmax_cross_entropy_with_logits.
inputs (tf.Tensor) – A Tensor of shape [batch_size, dim]. The forward activations of the input network.
num_sampled (int) – The number of classes to randomly sample per batch.
num_classes (int) – The number of possible classes.
num_true (int) – The number of target classes per training example.
sampled_values ((tf.Tensor, tf.Tensor, tf.Tensor)|None) – a tuple of (sampled_candidates, true_expected_count, sampled_expected_count) returned by a *_candidate_sampler function. (if None, we default to log_uniform_candidate_sampler)
subtract_log_q (bool) – whether to subtract the log expected count of the labels in the sample to get the logits of the true labels. Default is True. Turn off for Negative Sampling.
remove_accidental_hits (bool) – Whether to remove “accidental hits” where a sampled class equals one of the target classes.
partition_strategy (str) – A string specifying the partitioning strategy, relevant if len(weights) > 1. Currently “div” and “mod” are supported. Default is “mod”. See tf.nn.embedding_lookup for more details.
name (str|None) – A name for the operation.
seed (int|None) – random seed for candidate sampling. Default to None, which doesn’t set the op-level random seed for candidate sampling.
- Returns:
- out_logits: Tensor object with shape
[batch_size, num_true + num_sampled], for passing to either nn.sigmoid_cross_entropy_with_logits (NCE) or nn.softmax_cross_entropy_with_logits (sampled softmax).
- out_targets: A Tensor object with the same shape and dtype as out_logits.
These are the targets. If num_true > 1 the per-example labels are divided by num_true so they sum to 1.0.
- Return type:
(tf.Tensor, tf.Tensor)
- returnn.tf.util.basic.safe_deep_copy(obj)[source]¶
- Parameters:
obj (T)
- Returns:
deepcopy of obj, without copying TF types, Python modules, functions/lambdas
- Return type:
T
- class returnn.tf.util.basic.FetchHelper(tensor, verbose_stream=None)[source]¶
session.run(tensor)
does not work iftensor
is inside a loop (tf.while_loop
) (ortf.cond
). You would get an error like this:Operation '...' has been marked as not fetchable.
This class is a helper to work around that. It will add an op to the graph, which stores the most recent value. To get this executed automatically, you likely want to add is as a control dependency to another op. Use
add_to_control_inputs()
for that, or bettercopy_graph_replace_tensors()
, or bettercopy_graph()
.- Parameters:
tensor (tf.Tensor)
verbose_stream (IO[str]|None)
- classmethod copy_graph(fetches, target_op, fetch_helper_tensors, stop_at_ts=(), verbose_stream=None)[source]¶
- Parameters:
fetches (tf.Tensor|list[tf.Tensor]|T)
target_op (tf.Operation) – will add the fetch helpers as control dependencies to this op
fetch_helper_tensors (list[tf.Tensor])
verbose_stream (IO[str]|None)
stop_at_ts (Iterable[tf.Tensor]) – iterable of tensors at which the graph walk stops.
- Returns:
copied fetches, fetch helpers, transformed target op
- Return type:
(tf.Tensor|list[tf.Tensor]|T, list[FetchHelper], tf.Operation)
- classmethod copy_graph_replace_tensors(fetches, fetch_helpers)[source]¶
- Parameters:
fetches (tf.Tensor|list[tf.Tensor])
fetch_helpers (list[FetchHelper])
- Returns:
as fetches
- Return type:
tf.Tensor|list[tf.Tensor]
- add_to_control_inputs(other_op)[source]¶
Note: This will not work if you already did a
session.run
. See here. Usecopy_graph_replace_tensors()
instead. Or bettercopy_graph()
.- Parameters:
other_op (tf.Operation)
- returnn.tf.util.basic.is_axis_from_description_recurrent(axis, network, data)[source]¶
- Parameters:
axis (str|Dim) – expected not to be transformed via transform_config_dict or so. So single_step_dim, when moved out of the recurrent loop, is still single_step_dim. We detect this here.
network (returnn.tf.network.TFNetwork)
data (Data)
- Return type:
bool