TFUtil

Lots of random utility functions for TensorFlow. Also provides Data.

class TFUtil.CollectionKeys[source]

Extension of tf.GraphKeys

RETURNN_LAYERS = '_RETURNN_layers'[source]
RETURNN_NET_STACK = '_RETURNN_network_stack'[source]
STATE_VARS = '_RETURNN_state_vars'[source]
TFUtil.tf_version_tuple()[source]
Returns:version tuple, e.g. (1, 1, 0), parsed from tf.__version__
Return type:tuple[int]
TFUtil.assert_min_tf_version(version, reason)[source]
Parameters:
  • version (tuple[int]) – e.g. (1,2,0) or (1,2)
  • reason (str) –
TFUtil.have_min_tf_version(version)[source]
Parameters:version (tuple[int]) – e.g. (1,2,0) or (1,2)
Returns:True if we have at least that version, or newer
Return type:bool
class TFUtil.DimensionTag(kind=None, description=None, dimension=None, dyn_size=None, src_data=None, src_axis=None)[source]

This identifies one axis/dimension, like a time-dimension, etc. This can be used by Data. See Data.get_dim_tag(). It is not to specify the specific axis in a specific Data/tensor, but to specify the content and dimension. I.e. if we have the same DimensionTag for two Data instances, the dimensions should match. I.e.:

data1.get_dim_tag(i) == data2.get_dim_tag(j)
=> tf.shape(data1.placeholder)[i] == tf.shape(data2.placeholder)[j]
Parameters:
  • kind (str|None) –
  • description (str|None) – the description should be unique
  • dimension (int|None) –
  • dyn_size (tf.Tensor|None) – e.g. seq_len, (batch,)
  • src_data (Data|None) –
  • src_axis (int|None) –
class Types[source]

Defines possible values for kind.

Unspecified = None[source]
Batch = 'batch'[source]
Spatial = 'spatial'[source]
Time = 'spatial'[source]
Feature = 'feature'[source]
set_tag_on_size_tensor(self, x)[source]
Parameters:x (tf.Tensor) –
classmethod get_tag_from_size_tensor(x)[source]
Parameters:x (tf.Tensor) – size tensor. has been set before via set_tag_on_size_tensor()
Return type:DimensionTag|None
can_compare(self)[source]
Returns:whether we can clearly identify this axis. for axes with dynamic size, we require the dyn_size.
Return type:bool
is_equal(self, other, ignore_feature_dim=False, allow_same_feature_dim=False, allow_same_spatial_dim=None, treat_feature_as_spatial=False)[source]

Compares self to other for equality. Note that the default behavior is very restrictive. Use functions such as get_all_dimension_tags() or get_existing_tag_from_collection() to explicitly specify the behavior for the comparison.

Parameters:
  • other (DimensionTag) –
  • ignore_feature_dim (bool) –
  • allow_same_feature_dim (bool) –
  • allow_same_spatial_dim (bool|None) –
  • treat_feature_as_spatial (bool) –
Return type:

bool

get_same_base(self)[source]
Return type:DimensionTag
same_base_id[source]
Return type:int
declare_same_as(self, other)[source]
Parameters:other (DimensionTag) –
classmethod get_existing_tag_from_collection(other, tags, is_equal_opts=None)[source]
Parameters:
Return type:

DimensionTag|None

classmethod get_all_dimension_tags(data_list, is_equal_opts=None, unique_separate_axes=True)[source]
Parameters:
  • data_list (list[Data]) –
  • is_equal_opts (dict[str]|None) – passed to DimensionTag.is_equal
  • unique_separate_axes (bool) – e.g. data_list=[Data with shape (B,5,5,10)] results in 4 dim tags, not 3.
Returns:

list of dimension tags, dict for data -> list of dimension tags (for each axis)

Return type:

(list[DimensionTag], dict[Data, list[DimensionTag]])

classmethod get_uniq_collection(tags, is_equal_opts=None)[source]
Parameters:
Return type:

list[DimensionTag]

class TFUtil.SearchBeam(beam_size, dependency=<class 'Util.NotSpecified'>, name=None, _next_frame=None)[source]

Represents info about the beam from some beam search (e.g. via beam_search()), e.g. such as the beam size, but also the dependencies. This is somewhat parallel to SearchChoices, but simpler, and independent from the layers/network (LayerBase).

Parameters:
  • beam_size (int) –
  • dependency (SearchBeam|NotSpecified|None) –
  • name (str|None) –
  • _next_frame (SearchBeam|None) –
copy_as_prev_frame(self)[source]
Return type:SearchBeam
classmethod get_combined_beam(beam1, beam2=None, *beams)[source]

Combines beams. This will throw an exception if they cannot be combined. Note that in beam search (see SearchChoices), the logic to combine beams from different search choices happens in a generic way for all layers automatically via TFNetwork._create_layer_layer_desc(), so normally we already have the same beam. Unless we are at template construction.

Parameters:
  • beam1 (SearchBeam|None) –
  • beam2 (SearchBeam|None) –
  • beams (SearchBeam|None) –
Return type:

SearchBeam|None

class TFUtil.Data(name, shape=None, dtype=None, placeholder=None, sparse=None, dim=<class 'Util.NotSpecified'>, size_placeholder=None, batch_dim_axis=0, time_dim_axis=<class 'Util.NotSpecified'>, feature_dim_axis=<class 'Util.NotSpecified'>, available_for_inference=True, auto_create_placeholders=False, vocab=None, same_dim_tags_as=None, undefined=False, beam=None)[source]

This class is to describe a tensor, i.e. its shape and properties like whether we should consider it sparse data (i.e. it represents indices). This is used in TFNetwork to describe the dataset external data as well as in every layer’s output.

Parameters:
  • name (str) –
  • shape (tuple[int|None]|list[int|None]) – including time-dim (can be None). excluding batch-dim. e.g. (time,feat)=(None,128)
  • dtype (str) – e.g. “float32” or “int64”
  • placeholder (tf.Tensor|None) – with added batch-dim
  • sparse (bool) – whether to treat the value as an index. do not confuse with tf.SparseTensor
  • dim (None|int) – feature dimension, shape[-1] if not sparse, otherwise like num_classes
  • batch_dim_axis (int|None) – where we add the batch-dim. e.g. shape=(time,…), 0 -> (batch,time,…), 1 -> (time,batch,…). This is normally always set, and a lot of code expects this. However, you can set it to None if this Data does not have a batch-dim.
  • time_dim_axis (int|None) – where we have the time dim axis, after we added the batch-dim. this is often 1. however, can be None if there is no time-dim.
  • feature_dim_axis (int|None|NotSpecified) – feature dim axis. by default it’s the last one
  • size_placeholder (dict[int,tf.Tensor]|None) – for every None in shape, this will describe the size. The size is always a tensor of shape (batch,), i.e. the size can be different for each sequence in a batch.
  • available_for_inference (bool) – e.g. the extern data “classes” is usually not available for inference
  • vocab (str|dict[str]|GeneratingDataset.Vocabulary|None) –
  • same_dim_tags_as (dict[int|str,DimensionTag]|None) – will mark our dimension tags to be the same
  • undefined (bool) –
  • beam (SearchBeam|None) – the batch-dim could be extended by a beam-size, such that it represents the merged dims [batch, beam_size].
size_dtype = 'int32'[source]
classmethod from_tensor(x)[source]
Parameters:x (tf.Tensor) –
Return type:Data
classmethod create_undefined(name=None)[source]
Parameters:name (str) –
Returns:Data with undefined=True. the shape/dtype does not really matter
Return type:Data
sanity_check(self, ignore_placeholder=False)[source]

Performs some sanity checks on self, and raises exceptions if something is not sane.

Parameters:ignore_placeholder (bool) –
get_placeholder_kwargs(self, with_batch=True)[source]
Parameters:with_batch (bool) –
Returns:kwargs for tf.placeholder
Return type:dict[str]
get_axes_with_size(self)[source]
Returns:list of axes which can vary in size for each entry of the batch-dim, e.g. the time-dim-axis. The axis index is counted without the batch-dim.
Return type:list[int]
get_size_placeholder_kwargs(self, axis, with_batch=True)[source]
Parameters:
  • axis (int) –
  • with_batch (bool) –
Returns:

kwargs for tf.placeholder

Return type:

dict[str]

get_kwargs(self, with_size_placeholder=False)[source]
Parameters:with_size_placeholder (bool) –
Returns:relevant attrib items for copying
Return type:dict[str]
get_description(self, with_name=True, with_placeholder=False)[source]
Parameters:
  • with_name (bool) –
  • with_placeholder (bool) –
Returns:

description of self. also used for __repr__

Return type:

str

get_batch_axes_short_description(self)[source]
Return type:list[str]
get_compare_key(self)[source]
Returns:some key which can be used for compare functions, i.e. such that cmp(get_compare_key(self), get_compare_key(other)) == cmp(self, other), i.e. we define some order by that. Note that this order is not totally fixed, and might change.
Return type:object
copy(self, name=None)[source]
Parameters:name (str) – if given, will overwrite this name
Returns:copy of myself, using self.get_kwargs(), and with placeholder and size_placeholder
Return type:Data
copy_as_batch_major(self)[source]
Returns:copy of myself with batch_dim_axis == 0
Return type:Data
copy_as_time_major(self)[source]
Returns:copy of myself with time_dim_axis == 0
Return type:Data
copy_with_batch_dim_axis(self, batch_dim_axis)[source]
Parameters:batch_dim_axis (int) –
Returns:copy of myself with specific batch_dim_axis
Return type:Data
copy_with_time_dim_axis(self, time_dim_axis)[source]
Parameters:time_dim_axis (int) –
Returns:copy of myself with specific time_dim_axis
Return type:Data
copy_move_axis(self, old_axis, new_axis)[source]
Parameters:
  • old_axis (int) – counted with batch-dim
  • new_axis (int) – counted with batch-dim
Returns:

copy of myself with moved axis (see move_axis())

Return type:

Data

copy_as_bt_or_tb_major(self)[source]
Return type:Data
Returns:copy of myself in batch-time-major or time-batch-major
copy_with_feature_dim_axis(self, feature_dim_axis)[source]
Parameters:feature_dim_axis (int) – can also be negative
Returns:copy of myself with specific feature dim axis
Return type:Data
copy_as_batch_feature_major(self)[source]
Returns:copy of self with batch_dim_axis == 0 and feature_dim_axis == 1
Return type:Data
copy_as_batch_spatial_major(self)[source]
Returns:copy with batch_dim_axis == 0, then all dynamic axes, then any other spatial axes, last feature axis
Return type:Data
copy_with_feature_last(self)[source]
Returns:copy of self with feature_dim_axis being the very last axis
Return type:Data
copy_add_batch_dim(self, batch_dim_axis)[source]
Parameters:batch_dim_axis (int) –
Returns:copy of myself with added batch-dim
Return type:Data
copy_add_spatial_dim(self, spatial_dim_axis=None, dim=1, auto_time_dim_axis=True)[source]
Parameters:
  • spatial_dim_axis (int|None) – counted with batch-dim. if there is no time-dim, this will be it.
  • dim (int|None) –
  • auto_time_dim_axis (bool) –
Returns:

copy of myself with added spatial-dim

Return type:

Data

copy_add_feature_dim(self, axis=None)[source]
Parameters:axis (int|None) –
Returns:self with a new feature dim axis with dim 1. If there is an existing feature dim, the new feature dim will be added right after. If we are sparse, we don’t add a feature dim, but it becomes a spatial dim instead.
Return type:Data
get_default_new_axis_for_dim_tag(self, dim_tag)[source]
Parameters:dim_tag (DimensionTag) –
Return type:int
copy_add_dim_by_tag(self, dim_tag, unbroadcast=False, axis=None)[source]
Parameters:
  • dim_tag (DimensionTag) –
  • unbroadcast (bool) –
  • axis (int|None) –
Return type:

Data

copy_split_feature_dim(self, new_feature_dim)[source]
Parameters:new_feature_dim (int) – will be the new dim
Return type:Data
copy_compatible_to(self, data, unbroadcast=False, data_dyn_shape=None, check_sparse=True, check_dtype=True)[source]
Parameters:
  • data (Data) – other data which the returned tensor should be compatible to It would add any missing axes with a dim 1 axis for automatic broadcasting. It currently does not check whether existing dims match.
  • unbroadcast (bool) – if True, all broadcast axes (axes with dim 1) will be tiled such that they match
  • data_dyn_shape (tf.Tensor|list[tf.Tensor|int]|tuple[tf.Tensor|int]|None) – For unbroadcast, if we do not want to rely on tf.shape(data.placeholder).
  • check_sparse (bool) –
  • check_dtype (bool) –
Returns:

Data, might add broadcast dimensions

Return type:

Data

copy_time_flattened(self)[source]
Returns:copy of myself where the time-axis is flattened away into the batch-dim-axis. See get_placeholder_time_flattened() and :func:`flatten_with_seq_len_mask for more details.
Return type:Data
copy_extend_with_beam(self, beam)[source]
Parameters:beam (SearchBeam|None) –
Returns:copy of myself where the batch-dim is extended/multiplied by beam_size, using tile_transposed
Return type:Data
copy_squeeze_axes(self, axes)[source]
Parameters:axes (list[int]) – counted with batch dim
Returns:copy of myself, with squeezed axes
Return type:Data
copy_template(self, name=None, dtype=None)[source]
Parameters:
  • name (str|None) –
  • dtype (str|None) –
Returns:

copy of myself, using self.get_kwargs(), without placeholder

Return type:

Data

copy_template_excluding_axis(self, exclude_axis, name=None)[source]
Parameters:
  • exclude_axis (int) – axis to be removed.
  • name (str|None) – if set, this will be the new name.
Returns:

copy of myself excluding exclude_axis axis, without placeholder.

Return type:

Data

copy_template_excluding_spatial_dim(self, spatial_axis_num, name=None)[source]
Parameters:
  • spatial_axis_num (int) – index in self.get_spatial_batch_axes()
  • name (str|None) – if set, this will be the new name
Returns:

copy of myself excluding the time-dimension without placeholder

Return type:

Data

copy_template_excluding_time_dim(self, name=None)[source]
Parameters:name (str|None) – if set, this will be the new name
Returns:copy of myself excluding the time-dimension without placeholder
Return type:Data
copy_template_adding_time_dim(self, name=None, time_dim_axis=0)[source]

Adds a time-dim-axis. If a time-dim-axis already exists, it will anyway create this new one.

Parameters:
  • name (str|None) – if set, this will be the new name
  • time_dim_axis (int) – the new time-dim-axis index
Returns:

copy of myself adding the time-dimension without placeholder

Return type:

Data

copy_template_replace_dim(self, axis, new_dim, new_size=None)[source]
Parameters:
  • axis (int) –
  • new_dim (int|None) –
  • new_size (tf.Tensor|None) –
Return type:

Data

matches_var_dim_pattern(self, other)[source]
Parameters:other (Data) –
Returns:whether the variable-dims pattern matches, i.e. same variable dims (get_variable_dim_pattern), same time dim, excluding batch-dim. i.e. the size_placeholder should be compatible.
Return type:bool
batch_shape[source]
Returns:shape with added batch-dim. e.g. (batch,time,feat) = (None,None,128)
Return type:tuple[int|None]
get_batch_shape(self, batch_dim)[source]
Parameters:batch_dim (int|tf.Tensor|None) –
Returns:shape with added batch-dim. e.g. (batch,time,feat) = (None,None,128)
Return type:tuple[int|None]
get_dynamic_batch_shape(self)[source]
Return type:list[int|tf.Tensor]
shape_dense[source]
Returns:shape with feature dim axis
Return type:tuple[int|None]
shape_sparse[source]
Returns:shape without feature dim axis
Return type:tuple[int|None]
batch_shape_dense[source]
Return type:tuple[int|None]
ndim[source]
Return type:int
Returns:ndim counted without batch-dim
ndim_dense[source]
Return type:int
Returns:ndim counted without batch-dim, added by 1 if we are sparse
batch_ndim[source]
Return type:int
Returns:ndim counted with batch-dim
batch_ndim_dense[source]
Return type:int
Returns:ndim counted with batch-dim, added by 1 if we are sparse
is_time_major[source]
Returns:whether this is in time-major format, i.e. (time,batch,…)
Return type:bool
is_batch_major[source]
Returns:whether this is in batch-major format, i.e. (batch,…)
Return type:bool
is_batch_feature_major[source]
Returns:whether this is in batch-feature-major format, i.e. (batch,feature,…) (NC…)
Return type:bool
feature_dim_axis[source]
Returns:feature dim axis, counted with batch-dim
Return type:int|None
feature_dim_axis_or_unspecified[source]
Returns:feature dim axis, counted with batch-dim. could also be unspecified
Return type:int|None|NotSpecified
time_dim_axis_excluding_batch[source]
Return type:int|None
time_dimension(self)[source]
Returns:shape(placeholder)[time_dim_axis], int scalar
Return type:tf.Tensor
get_dim(self, axis)[source]
Parameters:axis (int) – counted with batch-dim
Returns:shape[axis]
Return type:tf.Tensor|int
get_placeholder_as_time_major(self)[source]
Return type:tf.Tensor
get_placeholder_as_batch_major(self)[source]
Return type:tf.Tensor
get_placeholder_with_specific_batch_dim_axis(self, batch_dim_axis)[source]
Parameters:batch_dim_axis (int) –
Return type:tf.Tensor
get_placeholder_time_flattened(self)[source]
Returns:via flatten_with_seq_len_mask()
Return type:tf.Tensor
get_placeholder_flattened(self, keep_dims=False)[source]
Parameters:keep_dims (bool) – if set, it will add broadcast dimensions after the flattening behind the first axis
Return type:tf.Tensor
Returns:placeholder where all dynamic axes are flattened into a single axis. e.g. for the usual case (batch, time, dim), it becomes (batch’|time’, dim), or (batch, time, height, dim) will also become (batch’|time’, dim). with keep_dims, (batch, time, height, dim) will become (batch’|time’, 1, 1, dim).
get_axes(self, exclude_time=False, exclude_batch=False, exclude_feature=False)[source]
Parameters:
  • exclude_time (bool) – will filter out the time-axis
  • exclude_batch (bool) – will filter out the batch-axis
  • exclude_feature (bool) – will filter out the feature-axis
Returns:

list of axes, like range(len(self.shape)), calculated with batch dim.

Return type:

list[int]

get_axes_from_description(self, axes, allow_int=True)[source]
Parameters:
  • axes (int|list[int]|str|list[str]|None) – one axis or multiple axis, or none. This is counted with batch-dim, which by default is axis 0 (see enforce_batch_dim_axis). It also accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and more (see the code).
  • allow_int (bool) – whether to allow an int directly. in almost all cases, it is better to use a symbolic name to specify an axis, as different layers could reorder them, and maybe also change their behavior in the future.
Returns:

list of axes, counted with batch-dim

Return type:

list[int]

get_axis_from_description(self, axis, allow_int=True)[source]
Parameters:
  • axis (int|str) –
  • allow_int (bool) –
Returns:

axis, counted with batch-dim

Return type:

int

get_axis_by_tag_name(self, name, spatial_only=False)[source]
Parameters:
  • name (str) – the tag name, or part of it (must be unique, and must exist)
  • spatial_only (bool) –
Return type:

int

get_batch_axis_excluding_batch(self, axis)[source]
Parameters:axis (int) – counted with batch-dim
Returns:axis counted without batch-dim
Return type:int|None
get_batch_axis(self, axis)[source]
Parameters:axis (int) – counted without batch-dim
Returns:axis counted with batch-dim
Return type:int
have_batch_axis(self)[source]
Return type:bool
have_time_axis(self)[source]
Return type:bool
have_feature_axis(self)[source]
Return type:bool
is_time_axis_dynamic(self)[source]
Returns:whether there are different seq-lens for the time, or all the same (static)
Return type:bool
is_axis_dynamic(self, axis)[source]
Parameters:axis (int) – counted with batch-dim axis
Returns:dynamic, i.e. we have it in size_placeholder. Note that this does not perfectly match with get_dynamic_axes(), but more with is_time_axis_dynamic(), although probably in most (all?) cases it should match. If True, you can get the size via get_dynamic_size().
Return type:bool
get_dynamic_size(self, axis)[source]
Parameters:axis (int) – counted with batch-dim axis. is_axis_dynamic() should be True
Returns:shape (B,)
Return type:tf.Tensor
get_dynamic_axes(self)[source]
Returns:list of axes, counted with batch-dim axis (but we exclude the batch dim axis itself)
Return type:list[int]
get_static_axes(self)[source]
Returns:list of axes, counted with batch-dim axis (but we exclude the batch dim axis itself)
Return type:list[int]
mark_same_time(self, other)[source]

If the dimension tag of others time axis matches any of our axes, we set our time axis to the selected one.

Parameters:other (Data) –
Returns:whether we have found the same
Return type:bool
is_same_time_dim(self, other)[source]

Checks whether we have a matching/compatible time dim.

Parameters:other (Data) –
Return type:bool
get_sequence_lengths(self)[source]
Returns:seq lens tensor of shape (batch,) of dtype int32. also see get_dynamic_size()
Return type:tf.Tensor
get_sequence_mask(self)[source]
Returns:seq mask of shape (batch,time) if we are batch-major, else (time,batch) if we are time-major
Return type:tf.Tensor
get_sequence_mask_broadcast(self, axis=None)[source]
Parameters:axis (int|None) –
Returns:seq mask of shape ((batch,time) or (time,batch)) + (1,)s for remaining dims if BT or TB major, and axis is T or None. In general compatible to placeholder, i.e. same ndim, with broadcast dims. We assert here that the axis is dynamic (is_axis_dynamic()), i.e. we have the size.
Return type:tf.Tensor
get_batch_dim(self)[source]
Return type:tf.Tensor
get_spatial_batch_axes(self)[source]
Return type:list[int]
Returns:list of axes which are not batch axes and not feature or which are time axis or dynamic. counted with batch-dim.
get_spatial_axes(self)[source]
Return type:list[int]
Returns:list of axes which are not feature and batch axes, counted without batch-dim.
get_feature_batch_axes(self)[source]
Return type:list[int]
Returns:list of axes which are feature axes, counted with batch-dim. currently there is only one or zero such axis.
get_feature_axes(self)[source]
Return type:list[int]
Returns:list of axes which are feature axes, counted without batch-dim.
SpecialAxesNames = ('batch_dim_axis', 'time_dim_axis', 'feature_dim_axis')[source]
get_special_axes_dict(self, counted_with_batch_dim=True, include_batch_dim_axis=False, only_available=False)[source]
Parameters:
  • counted_with_batch_dim (bool) –
  • include_batch_dim_axis (bool) –
  • only_available (bool) –
Returns:

dict axis-name -> axis

Return type:

dict[str,int]

get_bc_spatial_batch_shape(self)[source]
Returns:shape which will broadcast along all spatial dimensions and time/batch dim
Return type:tuple[int|None]
get_bc_shape(self, opts=None)[source]
Parameters:opts (dict[str|list|tuple,int|str|None]|None) – key specifies the axes. value 1 (‘x’) is broadcasting, -1 (None) is not broadcasting Axes should not be defined multiple times. The default behavior if an axis is not specified is like get_bc_spatial_batch_shape(), i.e. it will broadcast in batch and spatial dims only.
Returns:shape where 1 means broadcasting, None or >1 means not broadcasting. can be used for TFUtil.dropout()
Return type:tuple[int|None]
get_scope_name(self)[source]
Returns:via self.placeholder or any self.size_placeholder, or None
Return type:str|None
get_full_name(self)[source]
Returns:if we have a defined scope (via self.get_scope_name()), then scope_name + “/” + self.name, otherwise just self.name
Return type:str
get_dim_tag(self, axis)[source]
Parameters:axis (int) – counted with batch-dim
Return type:DimensionTag
get_time_dim_tag(self)[source]
Return type:DimensionTag
get_size_dim_tag(self, number)[source]
Parameters:number (int) – index in sorted(size_placeholder.keys())
Return type:DimensionTag
get_batch_shape_dim_tags(self)[source]
Returns:list of dimension tags, for each axis (counted with batch dim, i.e. len is batch_ndim)
Return type:tuple[DimensionTag]
classmethod get_common_data(sources, warnings_out=None, out_shape=None)[source]
Parameters:
  • sources (list[Data]) –
  • warnings_out (io.TextIOBase|io.StringIO|None) –
  • out_shape (list[int|tf.Tensor]|None) – will insert the shape dynamically
Returns:

some generic data where the sources should be compatible to (with copy_compatible_to), i.e. it contains the union of all axes from all sources (least common multiple).

Return type:

Data|None

TFUtil.init_horovod()[source]

Initializes Horovod. Provide this here such that we can remember whether we already initialized before.

class TFUtil.CustomUpdate[source]

Custom updates will be handled by TFUpdater.

set_on_var(self, var)[source]
Parameters:var (tf.Variable) – variable to update. this will be recognized by TFUpdater.Updater
update_var(self, var)[source]
Parameters:var (tf.Variable) – variable to update
Returns:operation which updates the variable, e.g. tf.assign_add(var, something)
Return type:tf.Operation
class TFUtil.CustomUpdateExpAverage(average, alpha)[source]

exponential moving average

Parameters:
  • average (tf.Tensor) –
  • alpha (float) –
update_var(self, var)[source]
Parameters:var (tf.Variable) –
Return type:tf.Tensor
TFUtil.set_param_axes_split_info(param, axes_split_info)[source]
Parameters:
  • param (tf.Variable|tf.Tensor) –
  • axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices
TFUtil.check_param_axes_split_info(param_shape, axes_split_info)[source]
Parameters:
  • param_shape (list[int|None]|tuple[int|None]) –
  • axes_split_info (list[list[int]|None]) – e.g. [[n],[n]*4] for LSTM matrices
TFUtil.get_param_axes_split_info(param)[source]

See set_param_axes_split_info().

Parameters:param (tf.Variable|tf.Tensor) –
Return type:list[list[int]|None]|None
TFUtil.transform_param_axes_split_info_to_new_shape(axes_split_info, new_shape)[source]

new_shape can be bigger or smaller than the old shape. In some simple cases, it is obvious how that should be done, e.g. [[a],[b]*4], [a*2,b*8] -> [[a*2],[b*2]*4] In some, it is not so. E.g. [[a+b],[b]*4], [a+b*2,b*8] -> [[a+b*2],[b*2]*4]. See test cases as well, test_transform_param_axes_split_info_to_new_shape(). No TF involved here, however, fits better to the functions above.

Parameters:
  • axes_split_info (list[list[int]]) –
  • new_shape (list[int]|tuple[int]) –
Returns:

new axes-split-info for the new shape

Return type:

list[list[int]]

TFUtil.copy_with_new_split_axes(old_axis_splits, new_axis_splits, old_values, new_values=None)[source]

On Numpy arrays only, however, fits better to the functions above.

Parameters:
  • old_axis_splits (list[list[int]]) –
  • new_axis_splits (list[list[int]]) –
  • old_values (numpy.ndarray) –
  • new_values (numpy.ndarray) –
Returns:

new values

Return type:

numpy.ndarray

class TFUtil.OutputWithActivation(x, act_func=None)[source]

Stores some tensor before and after some activation function, and also the activation function itself. (Maybe obsolete when you directly access the TF computation graph; but simpler.)

Parameters:
  • x (tf.Tensor) –
  • act_func (None|(tf.Tensor)->tf.Tensor) –
is_softmax_act_func(self)[source]
Return type:bool
get_logits(self)[source]
Return type:tf.Tensor
Returns:logits. logits are (not necessarily normalized) log probabilities, i.e. the input of softmax.

This call assumes that self.y is in probability space.

get_log_output(self)[source]
Return type:tf.Tensor
Returns:tf.log(output)
TFUtil.variable_scalar_summaries_dict(x, name=None)[source]

Collects all interesting information about x, such as min/max/mean, etc. (all scalars). This is used by variable_summaries().

Parameters:
  • x (tf.Tensor|tf.Variable) –
  • name (str) –
Returns:

dicth with key -> scalar info, e.g. with “%s_mean” % name -> tf.reduce_mean(x)

Return type:

dict[str,tf.Tensor]

TFUtil.variable_summaries(var, name=None, with_histogram=False)[source]

Attach a lot of summaries to a Tensor (for TensorBoard visualization). Also see variable_scalar_summaries_dict().

Parameters:
  • var (tf.Tensor|tf.Variable) –
  • name (str) –
  • with_histogram (bool) – adds histogram. note that this can add noticeable overhead
Returns:

nothing, use tf.summary.merge_all() to collect the summaries

TFUtil.get_valid_scope_name_from_str(s)[source]
Parameters:s (str) – some name
Returns:valid scope name, might be just s. see tf._VALID_SCOPE_NAME_REGEX and tf._VALID_OP_NAME_REGEX
Return type:str
TFUtil.get_current_var_scope_name()[source]
Returns:current absolute variable scope name, via tf.variable_scope
Return type:str
TFUtil.get_current_name_scope()[source]
Returns:current absolute name scope, via tf.name_scope
Return type:str

http://stackoverflow.com/questions/40907769/how-to-get-current-tensorflow-name-scope

Note that this is a private member and might break at some point. Note also that this does not need to be the same as get_current_var_scope_name().

TFUtil.reuse_name_scope(name, absolute=None, **kwargs)[source]

Context manager to reuse an already created scope. We try to both set the variable scope and the name scope.

Parameters:
  • name (str|tf.VariableScope) – relative or absolute name scope (absolute if absolute=True or if tf.VariableScope). must not end with “/”.
  • absolute (bool) – if True it will be absolute
  • kwargs – passed on to tf.variable_scope
Returns:

yields the variable_scope

TFUtil.opt_reuse_name_scope(name)[source]
Parameters:name (str|tf.VariableScope) –
Returns:yields the variable_scope
TFUtil.get_name_scope_of_tensor(x)[source]
Parameters:x (tf.Tensor) – has name e.g. “layer0/rec/W:0”
Returns:the name scope of x, e.g. “layer0/rec”
Return type:str
TFUtil.get_base_name(x)[source]
Parameters:x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”
Returns:return the base name, e.g. “W”, without the output index
TFUtil.reuse_name_scope_of_tensor(x, prefix='', postfix='', add_tensor_name=False)[source]
Parameters:
  • x (tf.Tensor|tf.Variable) – has name e.g. “layer0/rec/W:0”
  • prefix (str) –
  • postfix (str) –
  • add_tensor_name (bool) –
Returns:

reuse the name scope of x, e.g. “layer0/rec”, yields scope

TFUtil.default_control_flow_ctx()[source]

This was earlier called var_creation_scope.

If you create a variable inside of a while-loop, you might get the following error:

InvalidArgumentError: The node ‘while/w/Assign’ has inputs from different frames. The input ‘while/j’ is in frame ‘while/while/’. The input ‘while/w’ is in frame ‘’.

This happens when you directly call tf.Variable, because the initial_value might be a tensor which depends on the current control flow context. See tests/test_TFUtil.py:test_loop_var_creation() for an example.

Related TF bugs:

One solution is to reset the current control flow context. See also same_control_flow_ctx().

However, with respect to variables, you should instead use tf.get_variable, which does not have this problem.

class TFUtil.FlipGradientBuilder[source]

Gradient Reversal Layer. Discussion:

Code from here:
https://github.com/pumpikano/tf-dann/blob/master/flip_gradient.py

Also see CustomGradient which is more generic.

TFUtil.lookup_grad_func_by_name(op_type)[source]
Parameters:op_type (str) –
Returns:function grad_func(op, grad), or raises LookupError
TFUtil.opt_register_grad_func(op_type, grad_func, assert_is_same=True)[source]
Parameters:
  • op_type (str) –
  • grad_func – function grad_func(op, grad)
  • assert_is_same (bool) –
TFUtil.identity_with_check_numerics(x, with_grad=True, name='identity_with_check_numerics')[source]

Returns identity(x), but with additional check_numerics control dependency, and optionally the same for its gradient. See also TFUpdater.add_check_numerics_ops(), which will add checks for the whole graph.

Parameters:
  • x (tf.Tensor) –
  • with_grad (bool) – whether the check will also be added for the gradient
  • name (str) –
Return type:

tf.Tensor

TFUtil.check_input_ndim(x, ndim)[source]
Parameters:
  • x (tf.Tensor) –
  • ndim (int) –
Returns:

x with check added

Return type:

tf.Tensor

TFUtil.check_input_ndim_equal_offset(x, y, y_ndim_offset=0)[source]
Parameters:
  • x (tf.Tensor) –
  • y (tf.Tensor) –
  • y_ndim_offset (int) –
Returns:

x with check added such that ndim(x) == ndim(y) + y_ndim_offset

Return type:

tf.Tensor

TFUtil.check_input_dim(x, axis, dim)[source]
Parameters:
  • x (tf.Tensor) –
  • axis (int) – which axis to check
  • dim (int|tf.Tensor) –
Returns:

x with check added

Return type:

tf.Tensor

TFUtil.check_dim_equal(x, x_axis, y, y_axis, extra_msg=())[source]
Parameters:
  • x (tf.Tensor) –
  • x_axis (int) – which axis to check
  • y (tf.Tensor) –
  • y_axis (int) – which axis to check
  • extra_msg (list[str]|tuple[str]) – will be printed additionally if it fails
Returns:

x with check added that shape(x)[x_axis] == shape(y)[y_axis]

Return type:

tf.Tensor

TFUtil.check_shape_equal(x, y)[source]
Parameters:
  • x (tf.Tensor) –
  • y (tf.Tensor) –
Returns:

x with check added that shape(x) == shape(y)

Return type:

tf.Tensor

TFUtil.get_shape_dim(x, axis, name='shape_dim')[source]
Parameters:
  • x (tf.Tensor) –
  • axis (int) – which axis
  • name (str) –
Returns:

x.shape[axis] either as a static int or otherwise as an expression

Return type:

int|tf.Tensor

TFUtil.get_shape(x)[source]
Parameters:x (tf.Tensor|tf.Variable) –
Returns:list of scalars, which are either int if known statically, or otherwise expressions
Return type:list[int|tf.Tensor]
TFUtil.get_ndim(x)[source]
Parameters:x (tf.Tensor) –
Returns:x.ndim either as a static int or otherwise as an expression
Return type:int|tf.Tensor
TFUtil.get_range(start, stop=<class 'Util.NotSpecified'>)[source]
Parameters:
  • start (int|tf.Tensor|None) –
  • stop (int|tf.Tensor|None) –
Returns:

either tuple(range(start, stop)) or the same as a symbolic expression

Return type:

tuple[int]|tf.Tensor

TFUtil.identity_with_ops(x, ops)[source]
Parameters:
  • x (tf.Tensor) –
  • -> list[tf.Operation|tf.Tensor] ops (()) –
Returns:

x with all ops executed

Return type:

tf.Tensor

TFUtil.setup_tf_thread_pools(num_threads=None, log_file=None, tf_session_opts=None)[source]

See here for documentation of intra_op_parallelism_threads and inter_op_parallelism_threads: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto

intra_op_parallelism_threads is used for the LocalDevice::EigenThreadPoolInfo, which is always global. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/local_device.cc

inter_op_parallelism_threads is used for the (global if not use_per_session_threads) session thread pool. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc

TF will setup the thread pools on first usage. That can happen quite early, esp for intra_op_parallelism_threads. E.g. list_local_devices() will trigger this, i.e. any call to is_gpu_available() or print_available_devices(). For debugging, you can set the env-var TF_CPP_MIN_VLOG_LEVEL=1 and then check for these message:

Local device intra op parallelism threads: 4
Direct session inter op parallelism threads: 4

Thus, call this function as early as possible with your preferred number of threads, used for both thread pools. It will create a dummy session and directly close it again, but if you use the global thread pools, those settings will remain for further sessions. This function will only execute on the first call.

Parameters:
  • num_threads (int) – used for both intra and inter parallelism thread pools
  • log_file (stream|None) –
  • tf_session_opts (dict[str]) –
TFUtil.check_initial_tf_thread_pool_init(tf_session_opts=None)[source]

Makes sure that the TF thread pools are initialized with the requested settings. You probably want to call this very early.

Parameters:tf_session_opts (dict[str]|None) –
TFUtil.get_tf_list_local_devices(tf_session_opts=None)[source]

This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first. Note that this will list all available devices. Any TF session might only use a subset of these. You can get the list available in a given TF session by tf.Session.list_devices().

Parameters:tf_session_opts (dict[str]|None) – if given, will init a temp tf.Session with these opts
Return type:list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]
TFUtil.print_available_devices(tf_session_opts=None, file=None)[source]

Prints the available TF devices on file (stdout by default). This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Parameters:
  • tf_session_opts (dict[str]|None) – if given, will init a temp tf.Session with these opts
  • file (io.FileIO) –
TFUtil.is_gpu_available()[source]

Returns whether TensorFlow can access a GPU. This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Return type:bool
TFUtil.get_available_gpu_devices()[source]

Returns a list of available GPU devices. This uses tensorflow.device_lib.list_local_devices(). Note that a call to this will trigger the internal TF thread pool inits, so you should call setup_tf_thread_pools() first.

Return type:list[tensorflow.core.framework.device_attributes_pb2.DeviceAttributes|_DeviceAttributes]
TFUtil.get_available_gpu_min_compute_capability()[source]

Uses get_available_gpu_devices().

Returns:e.g. 3.0, or 5.0, etc, or None
Return type:float|None
TFUtil.dot(a, b, transpose_b=False)[source]
Parameters:
  • a (tf.Tensor) – shape […da…,d]
  • b (tf.Tensor) – shape [d,…db…] (or […db…,d] if transpose_b)
  • transpose_b (bool) –
Returns:

tensor of shape […da…,…db…]

Return type:

tf.Tensor

TFUtil.identity(x)[source]
Parameters:x (tf.Tensor) –
Return type:tf.Tensor
TFUtil.get_activation_function(s)[source]
Parameters:s (str|None) –
Return type:(tf.Tensor) -> tf.Tensor
TFUtil.gelu(x)[source]

Gaussian Error Linear Units (GELUs) (https://arxiv.org/abs/1606.08415). Alternative to relu.

Parameters:x (tf.Tensor) –
Return type:tf.Tensor
TFUtil.gelu2(x)[source]

Another approximation of the GELU (https://github.com/hendrycks/GELUs). Faster but less accurate than gelu (https://github.com/hendrycks/GELUs).

Parameters:x (tf.Tensor) –
Return type:tf.Tensor
TFUtil.random_uniform_abs_initializer(limit, **kwargs)[source]
Parameters:
  • limit (float|int|tf.Tensor) –
  • kwargs – passed to tf.random_uniform_initializer
Return type:

tensorflow.python.ops.init_ops.Initializer

TFUtil.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)[source]

Alias for tf.glorot_uniform_initializer or tf.glorot_normal_initializer.

Parameters:
  • uniform (bool) – uniform or normal distribution
  • seed (int) –
  • dtype (tf.DType) –
Returns:

((tuple[int]) -> tf.Tensor) | tensorflow.python.ops.init_ops.Initializer

TFUtil.wrap_distribution_non_zero(x, zero_limit, limit)[source]
Parameters:
  • x (tf.Tensor) – values in [-limit,limit]
  • zero_limit (float) –
  • limit (float) –
Returns:

same shape as x. rescale and shifts such that values from [-zero_limit,zero_limit] are excluded. still values are in [-limit,limit].

Return type:

tf.Tensor

class TFUtil.VarianceScalingNonZero(non_zero_fraction=0.5, **kwargs)[source]

Same as tf.VarianceScaling, i.e. truncated normal or uniform from [-limit,limit] for some limit, except that we exclude the range [-limit*non_zero_fraction,limit*non_zero_fraction]. non_zero_fraction=0 would yield no difference.

For reference, to get the behavior of glorot_uniform, use these args:
mode=”fan_avg”, distribution=”uniform”
TFUtil.variance_scaling_non_zero_initializer[source]

alias of TFUtil.VarianceScalingNonZero

TFUtil.load_txt_file_initializer(filename, dtype=tf.float32)[source]
Parameters:
  • filename (str) –
  • dtype (tf.DType) –
Returns:

function, when called, will return the content

Return type:

()->tf.Tensor

TFUtil.get_initializer(s, seed=None, eval_local_ns=None, dtype=tf.float32)[source]
Parameters:
  • s (str|dict[str]|float) – e.g. “glorot_uniform” or “truncated_normal” or “orthogonal”, or config dict with “class”, or string to be `eval`ed if it contains “(“. constant if a float is given.
  • seed (int|tf.Tensor) –
  • eval_local_ns (dict[str]|None) –
  • dtype (tf.DType|str) –
Returns:

(function (shape) -> tf.Tensor) | tf.Initializer

Return type:

((tuple[int]) -> tf.Tensor) | tf.Initializer

TFUtil.dropout(x, keep_prob, noise_shape=None, seed=None, name=None, cond_on_train=False, apply_correction_factor=True)[source]

Computes dropout. Like tf.nn.dropout() but avoid tf.div() if possible.

Parameters:
  • x (tf.Tensor) –
  • keep_prob (float|tf.Tensor) –
  • noise_shape (tf.Tensor|tuple[int|None]) – 1 will broadcast in that dimension, None will not broadcast
  • seed (int) –
  • name (str) –
  • cond_on_train (bool) – automatically wrap through cond_on_train_flag()
  • apply_correction_factor (bool) –
TFUtil.layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]

Layer normalization. Also see openai_layer_norm(). Also see tensorflow.contrib.layers.layer_norm().

Parameters:
  • x (tf.Tensor) –
  • gain (tf.Tensor) –
  • bias (tf.Tensor) –
  • axis (int) –
  • epsilon (float) – OpenAI uses 1e-6, TF contrib uses 1e-12, pbhatia243 uses 1e-5.
Return type:

tf.Tensor

TFUtil.openai_layer_norm(x, gain, bias, axis, epsilon=1e-06)[source]

Layer normalization, like layer_norm(), but fast kernel by OpenAI (implemented as part of their blocksparse). To use it, init the git submodule in extern/blocksparse.

Parameters:
  • x (tf.Tensor) –
  • gain (tf.Tensor) –
  • bias (tf.Tensor) –
  • axis (int) –
  • epsilon (float) –
Return type:

tf.Tensor

TFUtil.swapaxes(x, axis1, axis2)[source]

Also see move_axis() or dimshuffle().

Parameters:
  • x (tf.Tensor) –
  • axis1 (tf.Tensor|int) –
  • axis2 (tf.Tensor|int) –
Returns:

tensor with swapped axes, like numpy.swapaxes

Return type:

tf.Tensor

TFUtil.move_axis(x, old_axis, new_axis, name='move_axis')[source]

Also see swapaxes() or dimshuffle().

Parameters:
  • x (tf.Tensor) –
  • old_axis (int) – can also be negative
  • new_axis (int) – can also be negative
  • name (str) – name of the scope
class TFUtil.TensorCachedComputation(x, key)[source]

Helper to cache some computation inside a tf.Tensor object.

Parameters:
  • x (tf.Tensor) –
  • key (str|tuple[str|int|tf.Tensor]) –
has_cache(self)[source]
Returns:whether we have stored the value already. if True, you can use get_cache()
Return type:bool
get_cache(self)[source]
Return type:tf.Tensor
set_cache(self, value)[source]
Parameters:value (tf.Tensor) –
TFUtil.sequence_mask(lengths, name=None, **kwargs)[source]

Wraps around tf.sequence_mask(). It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:
  • lengths (tf.Tensor) – shape (batch,)
  • name (str|None) –
  • kwargs – passed on to tf.sequence_mask
Returns:

tensor mask of shape (batch,maxlen/time). default dtype is bool unless you specify something else

Return type:

tf.Tensor

TFUtil.sequence_mask_time_major(lengths, **kwargs)[source]

Wraps around tf.transpose(tf.sequence_mask(), (1,0)). It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:
  • lengths (tf.Tensor) – shape (batch,)
  • kwargs – passed on to tf.sequence_mask
Returns:

mask of shape (maxlen/time,batch)

Return type:

tf.Tensor

TFUtil.directed(x, direction)[source]

If direction == 1 or direction is None, returns just x. If direction == -1, returns reversed(x).

Parameters:
  • x (tf.Tensor) –
  • direction (int|None) – -1 or 1 (or None)
Return type:

tf.Tensor

TFUtil.reversed(x)[source]

Just returns x[::-1]. It will cache the value inside the passed object so that we don’t recompute it multiple times.

Parameters:x (tf.Tensor) –
Return type:tf.Tensor
TFUtil.flatten_with_seq_len_mask(x, seq_lens, batch_dim_axis=None, time_dim_axis=None, time_major=None)[source]
Parameters:
  • x (tf.Tensor) – shape (batch,…s…, time, …s’…) or shape (time,…s…., batch, …s’…)
  • seq_lens (tf.Tensor) – shape (batch,) of int32
  • batch_dim_axis (int) – index of batch_dim in x
  • time_dim_axis (int) – index of time_dim in x
  • time_major (bool) – whether time axis is 0 (redundant, kept for compatibility)
Returns:

tensor of shape (time’, …s…s’…) where time’ = sum(seq_len) <= batch*time

Return type:

tf.Tensor

TFUtil.expand_dims_unbroadcast(x, axis, dim, name='expand_dims_unbroadcast')[source]
Parameters:
  • x (tf.Tensor|float|int) –
  • axis (int|tf.Tensor) – new axis
  • dim (int|tf.Tensor) – dimension for axis
  • name (str) – scope name
Returns:

if x is of shape (a,b,c) and axis=0, then we return (dim,a,b,c)

Return type:

tf.Tensor

TFUtil.expand_multiple_dims(x, axes, name='expand_multiple_dims')[source]
Parameters:
  • x (tf.Tensor) –
  • axes (list[int]|tuple[int]) – after completion, tf.shape(y)[axis] == 1 for axis in axes
  • name (str) – scope name
Returns:

y where we have a new broadcast axis for each axis in axes

Return type:

tf.Tensor

TFUtil.tile_transposed(x, axis, multiples)[source]

Example: x with shape (D,), tf.tile(x, [N]) can be reshaped into (N,D), while tile_transposed(x, axis=0, multiples=N) can be reshaped into (D,N).

Parameters:
  • x (tf.Tensor) –
  • axis (int) –
  • multiples (int|tf.Tensor) –
Returns:

tensor with shape[axis] == x.shape[axis] * multiples

Return type:

tf.Tensor

TFUtil.constant_with_shape(x, shape, dtype=None, name='constant_with_shape')[source]
Parameters:
  • x (tf.Tensor|float|int|bool) – scalar
  • shape (list[tf.Tensor|int]|tuple[tf.Tensor|int]|tf.Tensor) –
  • dtype (tf.DType) –
  • name (str) –
Returns:

x of the specified shape

Return type:

tf.Tensor

TFUtil.dimshuffle(x, axes, name='dimshuffle')[source]

Like Theanos dimshuffle. Combines tf.transpose, tf.expand_dims and tf.squeeze.

Parameters:
  • x (tf.Tensor) –
  • axes (list[int|str]|tuple[int|str]) –
  • name (str) – scope name
Return type:

tf.Tensor

TFUtil.sparse_labels_with_seq_lens(x, seq_lens, dtype=tf.int32, collapse_repeated=False, post_filter_idx=None)[source]
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type
  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
  • dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32
  • collapse_repeated (bool) – like uniq() behavior
  • post_filter_idx (int|list[int]|set[int]|None) – if given, after an optional collapse_repeated, will remove all those idx
Returns:

SparseTensor, e.g. input for tf.nn.ctc_loss(), and seq_lens of shape (batch,)

Return type:

(tf.SparseTensor, tf.Tensor)

TFUtil.sparse_labels(x, seq_lens, dtype=tf.int32, collapse_repeated=False)[source]
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type
  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
  • dtype (tf.DType|None) – if given, will cast the x values to this type. ctc_loss() wants int32
  • collapse_repeated (bool) – like uniq() behavior
Returns:

SparseTensor, e.g. input for tf.nn.ctc_loss()

Return type:

tf.SparseTensor

TFUtil.uniq(x)[source]
Parameters:x (tf.Tensor) – 1D shape (time,) -> index, some int type
Returns:like numpy.uniq. unlike tf.unique which will never repeat entries.

Example: uniq([0, 0, 1, 1, 0, 0]) == [0, 1, 0], tf.unique([0, 0, 1, 1, 0, 0]) == [0, 1]. For a batched variant, see batched_uniq, or sparse_labels() with option collapse_repeated.

TFUtil.batched_uniq(x, seq_lens)[source]
Parameters:
  • x (tf.Tensor) – shape (batch,time) -> index, some int type
  • seq_lens (tf.Tensor|None) – shape (batch,) of int32|int64
Returns:

tuple (z, new_seq_lens), where z is of shape (batch, max_new_time), max_new_time = max(new_seq_lens), seq_lens is of shape (batch,).

Return type:

(tf.Tensor, tf.Tensor)

TFUtil.ctc_greedy_decode(logits, seq_lens, time_major)[source]

Similar to tf.nn.ctc_greedy_decoder(), but simpler implementation, and should run on GPU.

Parameters:
  • logits (tf.Tensor) – (time,batch,dim) or (batch,time,dim)
  • seq_lens (tf.Tensor) – shape (batch,) of int32|int64
  • time_major (bool) –
Return type:

tf.SparseTensor

Returns:

in batch-major, [batch,max_time] (like tf.nn.ctc_greedy_decoder())

TFUtil.get_common_shape(values, ignore_axes=())[source]

Related: tf.broadcast_dynamic_shape(). Also see unbroadcast_to_common_shape().

Parameters:
  • values (list[tf.Tensor|float|int]) –
  • ignore_axes (list[int]|tuple[int]) – these axes will be ignored
Returns:

common shape of all values. broadcasts dims with 1. will use static dims when possible. Dim of axes which are in ignore_axes will be None.

Return type:

list[tf.Tensor|int|None]

TFUtil.unbroadcast_to_common_shape(value, common_shape, ignore_axes=(), allow_only_noop=False)[source]
Parameters:
  • value (tf.Tensor|T) –
  • common_shape (list[tf.Tensor|int|None]) – see get_common_shape()
  • ignore_axes (list[int]|tuple[int]) –
  • allow_only_noop (bool) – if False, and the unbroadcast is not a no-op, will raise an exception
Returns:

(maybe) unbroadcasted value

Return type:

tf.Tensor|T

TFUtil.concat_with_opt_broadcast(values, allow_broadcast, axis, name='concat_with_opt_broadcast')[source]
Parameters:
  • values (list[tf.Tensor]) – all with same ndim
  • allow_broadcast (list[bool]) – same len as values
  • axis (int) –
  • name (str) –
Returns:

basically tf.concat(values, axis), but we can allow broadcasting for some values

Return type:

tf.Tensor

TFUtil.matrix_triangular(shape, dtype=tf.float32, lower=False, upper=False)[source]
Parameters:
  • shape (tuple[int|tf.Tensor]|tf.Tensor) –
  • dtype (tf.DType) –
  • lower (bool) –
  • upper (bool) –
Return type:

tf.Tensor

class TFUtil.VariableAssigner(var)[source]

Object helper to assign some var. (This is mostly obsolete now.)

Parameters:var (tf.Variable) –
assign(self, value, session)[source]
Parameters:
  • value (numpy.ndarray|int|float|list[str]) –
  • session (tf.Session) –
class TFUtil.CudaEnv[source]

Information about the Nvidia CUDA environment, and library. Also path to nvcc, the CUDA compiler.

verbose_find_cuda = False[source]
is_available(self)[source]
Return type:bool
get_compiler_opts(self)[source]
Return type:list[str]
get_compiler_bin(self)[source]
Returns:path
Return type:str
classmethod get_instance()[source]
Return type:CudaEnv
class TFUtil.OpCodeCompiler(use_cuda_if_available=True, cuda_auto_min_compute_capability=True, include_paths=(), ld_flags=(), **kwargs)[source]

Helper class to compile TF ops on-the-fly, similar to Theano. https://www.tensorflow.org/guide/extend/op https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/extend/adding_an_op.md

CacheDirName = 'returnn_tf_cache/ops'[source]
load_tf_module(self)[source]
Returns:module
class TFUtil.TFNativeUtilCompiler(include_paths=(), ld_flags=(), **kwargs)[source]

Helper class to compile TF utility functions on-the-fly.

CacheDirName = 'returnn_tf_cache/tf_utils'[source]
TFUtil.make_var_tuple(v)[source]
Parameters:v (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor]) –
Returns:tuple of tensors
Return type:tuple[tf.Tensor]
TFUtil.add_scaled_noise_to_gradients(grads_and_vars, gradient_noise_scale, sparse_grads=False)[source]

Adds scaled noise from a 0-mean normal distribution to gradients. Adapted from tf.contrib.layers.optimizers.

Parameters:
  • tf.Variable)] grads_and_vars (list[(tf.Tensor|tf.IndexedSlices,) –
  • gradient_noise_scale (float) – used as stddev for tf.truncated_normal().
  • sparse_grads (bool) – for sparse gradients (tf.IndexedSlices), it will only add the noise to the indexed values. Seems broken in some cases? Needs debugging.
Returns:

adapted grads_and_vars

Return type:

list[(tf.Tensor|tf.IndexedSlices, tf.Variable)]

class TFUtil.CustomGradient[source]

Utility functions to specify a custom gradient for a given function, which will be wrapped around via TF Defun().

Also see FlipGradientBuilder.

register(self, input_types, op, grad_op, name=None)[source]
Parameters:
  • input_types (list[tf.DType]|tuple[tf.DType]) –
  • -> tf.Tensor)|T op (((tf.Tensor)) –
  • tf.Tensor) -> tuple[tf.Tensor]|tf.Tensor grad_op ((tf.Operation,) – args are (op, out_grad) and it must return in_grad
  • name (str) – optional func_name
Returns:

op

Return type:

((tf.Tensor) -> tf.Tensor)|T

register_generic_loss_and_error_signal(self)[source]

If you want to use generic_loss_and_error_signal() at some point, call this as early as possible, because of https://github.com/tensorflow/tensorflow/issues/6804.

generic_loss_and_error_signal(self, loss, x, grad_x)[source]

Wrapper around self.register(). Expects that loss = loss(x), and grad_x = partial loss / partial x.

Parameters:
  • loss (tf.Tensor) –
  • x (tf.Tensor) –
  • grad_x (tf.Tensor) –
Returns:

loss but with the gradient for x

Return type:

tf.Tensor

class TFUtil.MetaLosses[source]

This provides a way to use an alternative gradient, or to use the original gradient (error signal) and do something with it. You can then define an additional (meta) loss using this.

This implements synthetic gradients, see synthetic_gradient().

class LossInfo(value, scale, norm_factor, name, source)[source]

Covers loss and other info.

Parameters:
  • value (tf.Tensor) –
  • scale (float) –
  • norm_factor (tf.Tensor) –
  • name (str) –
  • source (object) – e.g. layer
class Scope[source]

Defines the scope for a synthetic gradient. Create this object via MetaLosses.enter_gradient_scope(). Any meta-losses will be collected here via register_loss().

register_loss(self, loss)[source]
Parameters:loss (MetaLosses.LossInfo) –
exit(self)[source]

Exit the scope.

losses_as_fetch_dict(self)[source]
Return type:dict[str,tf.Tensor]
summed_loss_for_optimization(self)[source]
Return type:tf.Tensor
class ScopeCtxThreadLocal[source]

Thread local.

scope = None[source]
scope_ctx = <TFUtil.MetaLosses.ScopeCtxThreadLocal object>[source]
classmethod enter_gradient_scope()[source]
Return type:MetaLosses.Scope
classmethod exit_gradient_scope()[source]

Exit gradient scope.

classmethod synthetic_gradient(x, synthetic_grad_x, loss_scale=1.0, loss_name=None, loss_source=None)[source]

Decoupled Neural Interfaces using Synthetic Gradients, https://arxiv.org/abs/1608.05343

Parameters:
  • x (tf.Tensor) –
  • synthetic_grad_x (tf.Tensor) –
  • loss_scale (float) –
  • loss_name (str|None) –
  • loss_source (object|None) –
Returns:

x, where the gradient is overwritten by synthetic_grad_x, and when calculated, the gradient prediction loss will be added to cls.scope.

Return type:

tf.Tensor

classmethod tikhonov_regularized(x, dummy, loss_scale=1.0, loss_name=None, loss_source=None)[source]
Parameters:
  • x (tf.Tensor) –
  • dummy (tf.Tensor|tf.Variable) – scalar. can be used to enforce getting a gradient
  • loss_scale (float) –
  • loss_name (str|None) –
  • loss_source (object|None) –
Returns:

identity(x), where we add a Tikhonov regularization

Return type:

tf.Tensor

TFUtil.filter_grad(x, threshold, axis)[source]
Parameters:
  • x (tf.Tensor) –
  • threshold (float) – all grads going through x which max(grad**2) is over the threshold are removed
  • axis (int|list[int]) – max(grad**2) will be reduced over this axis
Returns:

identity(x) with custom gradient

Return type:

tf.Tensor

TFUtil.debug_register_better_repr()[source]

Some types don’t have good __repr__ implementations by default (for the current TF version). For debugging, it can be helpful to give some more info. This monkey-patches clazz.__repr__ of some TF classes if they are object.__repr__.

TFUtil.cond(pred, fn1, fn2, name=None)[source]

This is a wrapper around tf.control_flow_ops.cond(). This will be a branched execution, i.e. either fn1() or fn2() will be executed, or at least the resulting graph will be evaluated. If pred can is constant at the call, only the corresponding fn will be called. This is similar to the TF internal _smart_cond(). And similar to tf.contrib.framework.smart_cond.

Parameters:
  • pred (tf.Tensor|bool) –
  • fn1 (()->(tf.Tensor|list[tf.Tensor]|T)) –
  • fn2 (()->(tf.Tensor|list[tf.Tensor]|T)) –
  • name (str) –
Returns:

fn1() if pred else fn2()

Return type:

tf.Tensor|list[tf.Tensor]|T

TFUtil.single_strided_slice(x, axis, begin=None, end=None, step=None)[source]
Parameters:
  • x (tf.Tensor) –
  • axis (int|tf.Tensor) –
  • begin (int|tf.Tensor|None) –
  • end (int|tf.Tensor|None) –
  • step (int|tf.Tensor|None) –
Returns:

e.g. if axis == 0, returns x[begin:end:step], if axis == 1, returns x[:, begin:end:step], etc.

Return type:

tf.Tensor

TFUtil.circular_pad(x, paddings, axes=None)[source]
Parameters:
  • x (tf.Tensor) – shape (…, height, width)
  • (int,int))|tf.Tensor paddings (int|((int,int),) – how much to add ((top,bottom),(left,right))
  • axes (None|tf.Tensor|(tf.Tensor|int,tf.Tensor|int)) –
Returns:

tensor with shape (…, top + height + bottom, left + width + right)

Return type:

tf.Tensor

TFUtil.spatial_smoothing_energy(x, dim, use_circular_conv=True)[source]
Parameters:
  • x (tf.Tensor) – shape (…, dim)
  • dim (int) – last dimension of x
  • use_circular_conv (bool) – whether to use circular convolution, via circular_pad
Return type:

tf.Tensor

Returns:

energy of shape (…)

Via: Achieving Human Parity in Conversational Speech Recognition, Microsoft, 2017 (https://arxiv.org/abs/1610.05256). Interpret the last dimension as 2D (w, h) and apply some high-pass filter on it.

TFUtil.nan_to_num(x, nan_num=0, inf_num=1e+30)[source]

Like numpy.nan_to_num().

Parameters:
  • x (tf.Tensor|tf.IndexedSlices) –
  • nan_num (float|tf.Tensor) –
  • inf_num (float|tf.Tensor) –
Returns:

x with replaced nan and inf

TFUtil.where_bc(condition, x, y, name='where_bc')[source]

This is basically tf.where() but with additional broadcasting support. We explicitly require that the ndims match (or x, y can also be scalars). See also get_common_shape() and unbroadcast_to_common_shape().

https://github.com/tensorflow/tensorflow/issues/3945 https://github.com/tensorflow/tensorflow/issues/9284

Parameters:
  • condition (tf.Tensor) –
  • x (tf.Tensor|float|int) –
  • y (tf.Tensor|float|int) –
  • name (str) –
Returns:

basically tf.where(condition, x, y)

Return type:

tf.Tensor

TFUtil.identity_op_nested(x, name='identity')[source]
Parameters:
  • x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]) –
  • name (str) –

:rtype tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]

TFUtil.nd_indices(indices, batch_axis=0, indices_batch_major=None)[source]
Parameters:
  • indices (tf.Tensor) – e.g. (batch, …) -> index (or (…, batch, …) -> index)
  • batch_axis (int) – of the indices tensor itself
  • indices_batch_major (bool|None) – of the resulting 2-tuple, whether it represents (batch_idx, index) or (index, batch_idx). default is like batch_axis
Returns:

extended indices with batch-idx which can be used for tf.gather_nd, i.e. in the example of shape (batch, …, 2) where the 2-tuple represents (batch_idx, index) or (index, batch_idx). the shape[:-1] is exactly the same as the indices shape.

Return type:

tf.Tensor

TFUtil.stop_event_writer_thread(event_writer)[source]

There is a bug in TensorFlow (at least 1.1.0) (https://github.com/tensorflow/tensorflow/issues/4820) that the event writer thread is never stopped. This will try to stop it. Only do it if you don’t use the event writer anymore.

Parameters:event_writer (tensorflow.python.summary.writer.event_file_writer.EventFileWriter) –
TFUtil.optional_add(*args)[source]
Parameters:args (list[tf.Tensor|None]|int|float|tf.Tensor) –
Return type:tf.Tensor|int|float|None
Returns:sums all non-None values, or returns None if there are none
TFUtil.optional_mul(*args)[source]
Parameters:args (tf.Tensor|None|int|float) –
Return type:tf.Tensor|int|float|None
Returns:sums all non-None values, or returns None if there are none
TFUtil.opt_logical_and(*args)[source]
Parameters:args (tf.Tensor|bool) –
Returns:basically logical_and(*args), but leaves out all constants
Return type:tf.Tensor|bool
TFUtil.windowed_nd(source, window_size, window_left=None, window_right=None, padding='same', time_axis=1, new_window_axis=2)[source]

Constructs a new “window” axis which is a moving input over the time-axis. If you want to take out a window, i.e. a slice, see slice_nd().

Parameters:
  • source (tf.Tensor) – N-D tensor of shape (…, n_time, …)
  • window_size (int|tf.Tensor) – window size
  • window_left (int|tf.Tensor|None) –
  • window_right (int|tf.Tensor|None) –
  • padding (str) – “same” or “valid”
  • time_axis (int) –
  • new_window_axis (int) –
Returns:

tensor of shape (…, n_time, …, window, …)

Return type:

tf.Tensor

TFUtil.slice_nd(x, start, size)[source]
Parameters:
  • x (tf.Tensor) – shape (B, T, …)
  • start (tf.Tensor) – shape (B,), int32
  • size (int) –
Returns:

[x[start_1:size], x[start_2:size], …, x[start_B:size]], shape (B, size, …) Like slice_pad_zeros(), the size in the first axis will always be size, and we will pad with zeros.

Return type:

tf.Tensor

TFUtil.global_tensor(f, name)[source]

This creates a global accessible tensor in the graph to be reused later, i.e. on the second call given a unique name, it will not create a new tensor but return the previously created tensor. This is for the current graph, i.e. if there is a new graph, it will recreate the tensor.

Parameters:
  • -> tf.Tensor f (()) – callable which creates the tensor
  • name (str) – global reference name for the tensor. should be a valid scope name
Returns:

the tensor

Return type:

tf.Tensor

TFUtil.get_global_train_flag_placeholder()[source]

Also consider TFNetwork.get_current_network().train_flag(), or get_global_train_flag().

Returns:bool scalar tensor
Return type:tf.Tensor
TFUtil.get_global_train_flag()[source]
Return type:tf.Tensor|bool
Returns:global train flag
TFUtil.cond_on_train_flag(fn_train, fn_eval)[source]

Uses fn_train() or fn_eval() base on train_flag. It will be a branched evaluation. train_flag is determined via get_global_train_flag().

Parameters:
  • fn_train (()->tf.Tensor) –
  • fn_eval (()->tf.Tensor) –
Returns:

fn_train() if self.train_flag else fn_eval()

Return type:

tf.Tensor

TFUtil.get_random_seed()[source]
Return type:int|None
TFUtil.encode_raw(x, axis=-1, seq_lens=None)[source]

The inverse function of tf.decode_raw(). Also see: https://stackoverflow.com/questions/43403147/how-to-create-a-encode-raw-tensorflow-function

Parameters:
  • x (tf.Tensor) – of integer types [0,255], will get casted to uint8
  • axis (int) – the axis to reduce-join the string. decode_raw has added it at the end
  • seq_lens (tf.Tensor|None) – must have same shape as x after reduce-joining. Note that using seq_lens will make our output not compatible with tf.decode_raw() anymore because tf.decode_raw() requires all strings to be of the same length.
Returns:

string tensor

Return type:

tf.Tensor

TFUtil.get_shared_vocab(vocab_strings)[source]

The vocab is shared across the current instance of the computation graph. The tensor name might be different in different runs.

Parameters:vocab_strings (list[str]) –
Returns:shape (len(vocab_strings),), tf.string
Return type:tf.Tensor
TFUtil.map_labels(x, label_map, name='map_labels')[source]
Parameters:
  • x (tf.Tensor|tf.SparseTensor) – values of integer types
  • label_map (dict[int,int|None]) – should be dense on input
  • name (str) –
Returns:

mapped values

Return type:

tf.Tensor|tf.SparseTensor

TFUtil.remove_labels(x, labels)[source]
Parameters:
  • x (tf.SparseTensor) – sequences, i.e. the indices are interpret as (batch,time)
  • labels (set[int]|list[int]) –
Returns:

x where all provided labels are removed, and the indices are changed accordingly

Return type:

tf.SparseTensor

TFUtil.pad_zeros_in_axis(x, before=0, after=0, axis=0)[source]
Parameters:
  • x (tf.Tensor) –
  • before (int|tf.Tensor) –
  • after (int|tf.Tensor) –
  • axis (int) –
Returns:

TFUtil.slice_pad_zeros(x, begin, end, axis=0)[source]
Parameters:
  • x (tf.Tensor) – of shape (…, time, …)
  • begin (int|tf.Tensor) –
  • end (int|tf.Tensor) –
  • axis (int) –
Returns:

basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.

Return type:

tf.Tensor

TFUtil.post_control_dependencies(x, updates)[source]
Parameters:
  • x (tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]) –
  • updates (list[tf.Operation]) –
Returns:

identity(x) with control_dependencies(updates)

Return type:

tf.Tensor|list[tf.Tensor]|dict[str,tf.Tensor]

TFUtil.sequential_control_dependencies(l)[source]

tf.control_dependencies but each operation will be created such that it is executed after the ones coming before in the list, i.e. l[0] is executed first, l[-1] is executed last.

Parameters:l (list[()->(tf.Operation|tf.Tensor)]) –
TFUtil.global_queue(name, queue_type, capacity, dtypes, shapes=None, names=None)[source]
Parameters:
  • name (str) – global name
  • queue_type ((..)->tf.QueueBase) – some function which creates a queue
  • capacity
  • dtypes (list[tf.DType|str]) –
  • shapes (list[tf.TensorShape|tuple[int|None]]|None) –
  • names (list[str]|None) –
Return type:

tf.QueueBase

TFUtil.init_variable_if_needed(v)[source]
Parameters:v (tf.Variable) –
Return type:tf.Operation
TFUtil.auto_init_var(v)[source]
Parameters:v (tf.Variable) –
Returns:a reference to the var via tf.identity
Return type:tf.Tensor
TFUtil.true_once()[source]
Returns:tensor which will be True once and then always False Internally, this creates a non-trainable variable as a helper.
Return type:tf.Tensor
TFUtil.raise_OutOfRangeError()[source]
Returns:an op which raises an OutOfRangeError
Return type:tf.Operation
TFUtil.enforce_copy(x)[source]
Parameters:x (tf.Tensor|tf.Variable) –
Returns:copy of input, i.e. enforces that this is not a ref
Return type:tf.Tensor
TFUtil.view_as(x, dtype)[source]

Does the numpy.view equivalent. Note that the current implementation is inefficient (uses tf.py_func) and CPU-only. Also see tf.bitcast().

Parameters:
  • x (tf.Tensor) –
  • dtype (tf.DType) –
Returns:

x.view(dtype) equivalent (see numpy.view)

TFUtil.broadcast_gradient_args(shape_x, shape_y)[source]
Parameters:
  • shape_x (tf.Tensor) –
  • shape_y (tf.Tensor) –
Returns:

(axis reduce arg for grad x, axis reduce arg for grad y)

Return type:

(tf.Tensor, tf.Tensor)

TFUtil.maximum_with_identity_grad(x, y)[source]
Parameters:
  • x (tf.Tensor) –
  • y (tf.Tensor) –
Returns:

tf.maximum(x, y) where each will receive the gradient

Return type:

tf.Tensor

TFUtil.minimum_with_identity_grad(x, y)[source]
Parameters:
  • x (tf.Tensor) –
  • y (tf.Tensor) –
Returns:

tf.maximum(x, y) where each will receive the gradient

Return type:

tf.Tensor

TFUtil.clip_by_value_with_identity_grad(x, clip_value_min, clip_value_max)[source]
Parameters:
  • x (tf.Tensor) –
  • clip_value_min (tf.Tensor|float) –
  • clip_value_max (tf.Tensor|float) –
Returns:

tf.clip_by_value(x, clip_value_min, clip_value_max) where each will receive the gradient

Return type:

tf.Tensor

TFUtil.safe_log(x, eps=1e-20, use_fake_grad=True)[source]

Safe wrapper around tf.log() which avoids infs or nans in the gradient.

Parameters:
  • x (tf.Tensor) –
  • eps (float|tf.Tensor) –
  • use_fake_grad (bool) – True -> use maximum_with_identity_grad, False -> use tf.maximum
Returns:

log(max(x, eps))

Return type:

tf.Tensor

TFUtil.safe_exp(x, eps=1e-20)[source]
Parameters:
  • x (tf.Tensor) –
  • eps (float) –
Returns:

exp(x), but does clipping before, such that it never returns inf nor exactly 0.0. Also, we make sure that we use the gradient in all cases.

Return type:

tf.Tensor

TFUtil.l1_normalized(x, axis=-1, eps=1e-20, use_logsumexp=False, is_not_negative=False)[source]
Parameters:
  • x (tf.Tensor) – assumes != 0
  • axis (int|tf.Tensor) – in range [-rank(x),rank(x)]
  • eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps
  • use_logsumexp (bool) – eps must not be None
  • is_not_negative (bool) –
Returns:

y such that tf.reduce_sum(tf.abs(y)) == 1. i.e. y = x / tf.reduce_sum(tf.abs(x)).

Return type:

tf.Tensor

TFUtil.lin_exp(x, use_safe_exp=True)[source]
Parameters:
  • x (tf.Tensor) –
  • use_safe_exp (bool) –
Returns:

x + 1 if x >= 0 else exp(x). this is smooth and differentiable everywhere

Return type:

tf.Tensor

TFUtil.lin_exp_normed(x, axis=-1, eps=1e-10)[source]

This can be used as an alternative to softmax. It uses lin_exp() instead of exp.

Parameters:
  • x (tf.Tensor) –
  • axis (int|tf.Tensor) – in range [-rank(x),rank(x)]
  • eps (float|tf.Tensor|None) – for safety, to ensure that tf.reduce_sum(tf.abs(x)) >= eps
Returns:

y = l1_normalized(lin_exp(x)), i.e. tf.reduce_sum(y) == 1, and y >= 0.

Return type:

tf.Tensor

TFUtil.check_base_op_type_and_replace(x, op_type, new_op_type)[source]

Suppose you have x = tf.nn.softmax(z) and you want to get y = tf.nn.log_softmax(z). This function will test to see if x is of that kind and then return y.

Parameters:
  • x (tf.Tensor) –
  • op_type (str) – e.g. “Softmax”
  • new_op_type (str) – e.g. “LogSoftmax”
Returns:

x with new_op_type instead of op_type, or None if not matched

Return type:

tf.Tensor|None

TFUtil.copy_op(op, op_type=None, inputs=None)[source]

Copies a tf.Operation.

Parameters:
  • op (tf.Operation) –
  • op_type (str|None) –
  • inputs (list[tf.Tensor]|None) –
Returns:

copy of op but optionally change op.type == op_type or op.inputs == inputs

Return type:

tf.Operation

TFUtil.copy_tensor(x)[source]

Similar to tf.identity, but we ensure here that the return value has its own memory. This can be relevant when you want to keep a copy of the original variable value. See get_variable_value_copy_before_update_ops() for usage.

Parameters:x (tf.Tensor) –
Returns:a copy of x (points to new memory)
Return type:tf.Tensor
TFUtil.smoothing_cross_entropy(logits, labels, label_smoothing, gaussian=False, vocab_size=None)[source]

Cross entropy with label smoothing to limit over-confidence. Code adapted from here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/common_layers.py

Parameters:
  • logits (tf.Tensor) – Tensor of size shape(labels) + [vocab_size]
  • labels (tf.Tensor) – Tensor of size […]
  • vocab_size (int|tf.Tensor) – Tensor representing the size of the vocabulary.
  • label_smoothing (float) –

    confidence = 1.0 - label_smoothing. Used to determine on and off values for label smoothing. If gaussian is true, confidence is the variance to the gaussian distribution. A common default value is 0.1. See:

  • gaussian (bool) – Uses a gaussian distribution for label smoothing
Returns:

Tensor of the same shape as labels and of the same dtype as logits.

Return type:

tf.Tensor

TFUtil.softmax_cross_entropy_over_size(logits, labels, stable_gradient=True)[source]

The last spatial axis with dyn size info will be used and interpret as the class probabilities over the size. We will mask logits outside of the size. We expect that the labels have the corresponding invalid frames already set to 0.0. This can be used to measure the cross entropy between two soft alignments / attention weights.

Parameters:
  • logits (Data) – in log space, unscaled. shape (…,T,…). Shape can be eg. (B,dec-T,enc-T,H…), or (dec-T,enc-T,B,H…), etc. If it has multiple axes with dynamic size, we use the last one (enc-T in the example).
  • labels (Data) – in prob space. shape compatible to logits (but axes can be ordered differently). Shape can be e.g. (B,dec-T,enc-T,H…) etc. If is has multiple spatial axes, we expect them to be in the same order as of logits
  • stable_gradient (bool) – whether to use an explicit gradient
Returns:

shape as logits, but the T axis removed.

Return type:

tf.Tensor

TFUtil.interpolate_bilinear(grid, query_points, name='interpolate_bilinear', indexing='ij')[source]

Similar to Matlab’s interp2 function. Finds values for query points on a grid using bilinear interpolation. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.

Parameters:
  • grid (tf.Tensor) – a 4-D float Tensor of shape [batch, height, width, channels].
  • query_points (tf.Tensor) – a 3-D float Tensor of N points with shape [batch, N, 2]. Note that this function is not differentiable w.r.t. the query points.
  • name (str) – a name for the operation (optional).
  • indexing (str) – whether the query points are specified as row and column (ij), or Cartesian coordinates (xy).
Returns:

a 3-D Tensor with shape [batch, N, channels]

Return type:

tf.Tensor

TFUtil.dense_image_warp(image, flow, name='dense_image_warp')[source]

Image warping using per-pixel flow vectors. Adapted from tensorflow.contrib.image.dense_image_warp, from newer TF version which supports variable-sized images.

Parameters:
  • image (tf.Tensor) – 4-D float Tensor with shape [batch, height, width, channels].
  • flow (tf.Tensor) – A 4-D float Tensor with shape [batch, height, width, 2]. E.g. via create_random_warp_flow_2d(). Note that this function is not differentiable w.r.t. the flow.
  • name (str) – A name for the operation (optional).
Returns:

A 4-D float Tensor with shape`[batch, height, width, channels]` and same type as input image.

Return type:

tf.Tensor

TFUtil.create_random_warp_flow_2d(shape, std=None, scale=10.0, blur_std=2.0)[source]

Can be used with dense_image_warp().

Parameters:
  • shape (tf.Tensor|(int,int,int)) – 1D, contains (batch,height,width). e.g. tf.shape(image)[:-1]
  • std (float|(float,float)) –
  • scale (float|(float,float)) –
  • blur_std (float|(float,float)) –
Returns:

[batch, height, width, 2]

Return type:

tf.Tensor

TFUtil.gaussian_kernel_2d(size, std)[source]
Parameters:
  • size (int|(int,int)) –
  • std (float|(float,float)) –
Returns:

(size_x*2+1,size_y*2+1), float32

Return type:

tf.Tensor

TFUtil.gaussian_blur_2d(image, kernel_size=None, kernel_std=None)[source]
Parameters:
  • image (tf.Tensor) – (batch,width,height,channel)
  • kernel_size (int|(int,int)|None) –
  • kernel_std (float|(float,float)|None) –
Returns:

image

Return type:

tf.Tensor

TFUtil.bleu_score(hypothesis, truth, hyp_seq_lens, truth_seq_lens)[source]

Calculates the BLEU score. See Util.compute_bleu(). This currently wraps a Python function and thus is not efficient.

Parameters:
  • hypothesis (tf.Tensor) – (batch, max(hyp_seq_lens))
  • truth (tf.Tensor) – (batch, max(truth_seq_lens))
  • hyp_seq_lens (tf.Tensor) – (batch,)
  • truth_seq_lens (tf.Tensor) – (batch,)
Return type:

tf.Tensor

Returns:

(batch,), float32

TFUtil.prod(ls)[source]
Parameters:ls (list[T]|tuple[T]|numpy.ndarray|tf.Tensor) –
Return type:T|int|float|tf.Tensor
class TFUtil.Lock(name='Lock')[source]

A pure TensorFlow implementation of a mutex / lock. Probably obsolete now, as with TF 1.6.0, there is tf.contrib.framework.CriticalSection.

init(self)[source]
Return type:tf.Operation
lock(self)[source]

On first call, just returns. Any further call will block, unless there is an unlock() call.

Return type:tf.Tensor
unlock(self)[source]

Must be called after lock().

Return type:tf.Operation
class TFUtil.Condition(lock=None, name='Condition')[source]

A pure TensorFlow implementation of a condition.

init(self)[source]
Return type:tf.Operation
wait(self)[source]

Must be called with the lock held, will unlock while waiting for a signal.

wait_counter(self)[source]
Return type:tf.Tensor
signal(self)[source]

Must be called with the lock held. Emits one signal.

Return type:tf.Tensor
signal_all(self)[source]

Must be called with the lock held. Emits as many signals as they are waiters.

class TFUtil.GlobalTensorArrayOpMaker[source]

Creates a TensorArray which does not use the per-run (“per-step”) resource manager container but uses the standard container which persists across runs. This TensorArray resource handle is then just a standard TensorArray resource handle which can be used with all TensorArray related functions/ops.

Note: This whole implementation currently does not work because tensor_array.h is not available. See https://github.com/tensorflow/tensorflow/issues/10527 and test_GlobalTensorArray().

An alternative to this might be the MapStagingArea (https://github.com/tensorflow/tensorflow/pull/9686), which should get into TF 1.2.2.

code = '\n #include "tensorflow/core/framework/op_kernel.h"\n #include "tensorflow/core/framework/register_types.h"\n #include "tensorflow/core/framework/resource_mgr.h"\n #include "tensorflow/core/framework/tensor.h"\n #include "tensorflow/core/framework/tensor_shape.h"\n #include "tensorflow/core/framework/types.h"\n #include "tensorflow/core/kernels/bounds_check.h"\n #include "tensorflow/core/kernels/tensor_array.h"\n #include "tensorflow/core/lib/core/errors.h"\n #include "tensorflow/core/lib/core/refcount.h"\n #include "tensorflow/core/lib/strings/strcat.h"\n #include "tensorflow/core/platform/dynamic_annotations.h"\n #include "tensorflow/core/platform/logging.h"\n #include "tensorflow/core/platform/thread_annotations.h"\n #include "tensorflow/core/platform/types.h"\n\n using namespace tensorflow;\n \n // Adopted from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/data_flow_ops.cc.\n REGISTER_OP("GlobalTensorArray")\n .Input("size: int32")\n .Attr("container: string = \'\'")\n .Attr("shared_name: string = \'\'")\n .Attr("dtype: type")\n .Attr("element_shape: shape = { unknown_rank: true }")\n .Attr("dynamic_size: bool = false")\n .Attr("clear_after_read: bool = true")\n .Attr("tensor_array_name: string = \'\'")\n .Output("handle: resource")\n .Output("flow: float")\n .SetIsStateful()\n .SetShapeFn([](InferenceContext* c) {\n ShapeHandle unused;\n TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));\n c->set_output(0, c->Vector(2));\n c->set_output(1, c->Scalar());\n return Status::OK();\n })\n .Doc("GlobalTensorArray, persistent across runs");\n \n // Copied from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/tensor_array_ops.cc,\n // and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/resource_op_kernel.h.\n // The original TensorArrayOp used the per-run ("per-step") resource manager container\n // but we use the standard container which persists across runs.\n class GlobalTensorArrayOp : public OpKernel {\n public:\n explicit GlobalTensorArrayOp(OpKernelConstruction* context)\n : OpKernel(context), device_type_(context->device_type()) {\n OP_REQUIRES_OK(context, context->GetAttr("dtype", &dtype_));\n OP_REQUIRES_OK(context, context->GetAttr("element_shape", &element_shape_));\n OP_REQUIRES_OK(context, context->GetAttr("dynamic_size", &dynamic_size_));\n OP_REQUIRES_OK(context,\n context->GetAttr("clear_after_read", &clear_after_read_));\n OP_REQUIRES_OK(context,\n context->GetAttr("tensor_array_name", &tensor_array_name_));\n if (tensor_array_name_.empty()) tensor_array_name_ = name();\n\n AllocatorAttributes alloc_attr;\n alloc_attr.set_on_host(true);\n OP_REQUIRES_OK(context, context->allocate_persistent(\n tensorflow::DT_STRING, tensorflow::TensorShape({2}),\n &handle_, alloc_attr));\n }\n \n ~GlobalTensorArrayOp() {\n if (resource_ != nullptr) {\n resource_->Unref();\n if (cinfo_.resource_is_private_to_kernel()) {\n if (!cinfo_.resource_manager()\n ->template Delete<T>(cinfo_.container(), cinfo_.name())\n .ok()) {\n // Do nothing; the resource can have been deleted by session resets.\n }\n }\n }\n }\n \n void Compute(OpKernelContext* ctx) override {\n mutex_lock l(mu_);\n if (resource_ == nullptr) {\n ResourceMgr* mgr = ctx->resource_manager();\n OP_REQUIRES(ctx, mgr != nullptr, errors::Internal("No resource manager."));\n OP_REQUIRES_OK(ctx, cinfo_.Init(mgr, def()));\n auto h = handle_.AccessTensor(ctx)->template flat<string>();\n h(0) = cinfo_.container();\n h(1) = cinfo_.name();\n OP_REQUIRES_OK(ctx, CreateTensorArray(ctx, rm, &handle_, &resource_));\n }\n\n Tensor* handle;\n OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &handle));\n handle->flat<ResourceHandle>()(0) =\n resource_->resource_handle(ctx); \n if (ctx->num_outputs() == 2) {\n // Create the flow output.\n Tensor* flow;\n OP_REQUIRES_OK(ctx, ctx->allocate_output(1, TensorShape({}), &flow));\n if (device_type_ == DEVICE_CPU) {\n // Value doesn\'t matter, but this makes msan not complaint about\n // copying an uninitialized value. To do this on GPU would require\n // a kernel launch or a host->device memcpy, so we avoid that.\n flow->flat<float>()(0) = 0;\n }\n }\n }\n \n private:\n Status CreateTensorArray(OpKernelContext* ctx, ResourceMgr* rm,\n Tensor* tensor_array_output_handle,\n TensorArray** output_tensor_array) EXCLUSIVE_LOCKS_REQUIRED(mu_) {\n const Tensor* tensor_size;\n TF_RETURN_IF_ERROR(ctx->input("size", &tensor_size));\n \n if (!TensorShapeUtils::IsScalar(tensor_size->shape())) {\n return errors::InvalidArgument(\n "TensorArray size must be scalar, but had shape: ",\n tensor_size->shape().DebugString());\n }\n const int32 size = tensor_size->scalar<int32>()();\n if (size < 0) {\n return errors::InvalidArgument("Size should be >= 0.");\n }\n \n TensorArray* tensor_array = new TensorArray(\n cinfo_.name(), dtype_, *tensor_array_output_handle, size, element_shape_,\n dynamic_size_, false /* multiple_writes_aggregate */,\n false /* is_grad */, -1 /* marked_size */, clear_after_read_);\n \n // TODO: could use LookupOrCreate instead...\n TF_RETURN_IF_ERROR(\n rm->Create(cinfo_.container(), cinfo_.name(), tensor_array));\n \n *output_tensor_array = tensor_array;\n \n return Status::OK();\n }\n\n mutex mu_;\n ContainerInfo cinfo_ GUARDED_BY(mu_);\n PersistentTensor handle_ GUARDED_BY(mu_);\n TensorArray* resource_ GUARDED_BY(mu_) = nullptr;\n \n const DeviceType device_type_;\n DataType dtype_;\n PartialTensorShape element_shape_;\n bool dynamic_size_;\n bool clear_after_read_;\n string tensor_array_name_; // The name used to create the TensorArray.\n \n TF_DISALLOW_COPY_AND_ASSIGN(GlobalTensorArrayOp);\n };\n \n REGISTER_KERNEL_BUILDER(Name("GlobalTensorArray").Device(DEVICE_CPU), GlobalTensorArrayOp);\n\n '[source]
get_op(self)[source]
Returns:op
class TFUtil.TFArrayContainer(dtype, handle=None, container=None, shared_name=None, name='array_container')[source]

Array container, like std::vector, with random index access.

Currently does not work. See https://github.com/tensorflow/tensorflow/issues/10950, and test_TFArrayContainer(). Bug #10950 is fixed upstream, should be in TF 1.2.2.

An alternative to this could be GlobalTensorArrayOpMaker and MapStagingArea, which should get into TF 1.2.2.

Parameters:
  • dtype (tf.DType) –
  • container (str) –
  • shared_name (str) –
  • name (str) –
  • handle (tf.resource) – existing handle to reuse. otherwise we will create a new one
code = '\n #include <vector>\n\n // For Eigen::ThreadPoolDevice.\n #define EIGEN_USE_THREADS 1\n\n #include "tensorflow/core/framework/op.h"\n #include "tensorflow/core/framework/shape_inference.h"\n #include "tensorflow/core/framework/op_kernel.h"\n #include "tensorflow/core/framework/resource_mgr.h"\n #include "tensorflow/core/framework/resource_op_kernel.h"\n #include "tensorflow/core/framework/tensor.h"\n #include "tensorflow/core/framework/tensor_shape.h"\n #include "tensorflow/core/framework/types.h"\n #include "tensorflow/core/platform/macros.h"\n #include "tensorflow/core/platform/mutex.h"\n #include "tensorflow/core/platform/types.h"\n #include "tensorflow/core/common_runtime/device.h"\n\n using namespace tensorflow;\n\n REGISTER_OP("ArrayContainerCreate")\n .Attr("T: type")\n .Attr("container: string = \'\'")\n .Attr("shared_name: string = \'\'")\n .Output("resource: resource")\n .SetIsStateful()\n .SetShapeFn(shape_inference::ScalarShape)\n .Doc(R"doc(Array container, random index access)doc");\n\n REGISTER_OP("ArrayContainerGetSize")\n .Input("handle: resource")\n .Output("out: int32")\n .SetShapeFn(shape_inference::ScalarShape)\n ;\n\n REGISTER_OP("ArrayContainerSetSize")\n .Input("handle: resource")\n .Input("size: int32")\n ;\n\n REGISTER_OP("ArrayContainerGet")\n .Attr("T: type")\n .Input("handle: resource")\n .Input("index: int32")\n .Output("out: T")\n ;\n\n REGISTER_OP("ArrayContainerSet")\n .Attr("T: type")\n .Input("handle: resource")\n .Input("index: int32")\n .Input("value: T")\n ;\n\n // https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/resource_mgr.h\n struct ArrayContainer : public ResourceBase {\n ArrayContainer(const DataType& dtype) : dtype_(dtype) {}\n\n string DebugString() override { return "ArrayContainer"; }\n int64 MemoryUsed() const override { return 0; };\n\n mutex mu_;\n const DataType dtype_;\n std::vector<PersistentTensor> data_ GUARDED_BY(mu_);\n\n int32 get_size() {\n mutex_lock l(mu_);\n return (int32) data_.size();\n }\n\n Status set_size(int32 size) {\n if(size < 0)\n return errors::InvalidArgument("size ", size, " must be >= 0");\n mutex_lock l(mu_);\n data_.resize((size_t) size);\n return Status::OK();\n }\n\n Status get(OpKernelContext* ctx, int32 idx, PersistentTensor* v) {\n mutex_lock l(mu_);\n if(idx < 0)\n return errors::InvalidArgument("idx ", idx, " must be >= 0");\n if((size_t)idx >= data_.size())\n return errors::InvalidArgument("idx ", idx, " must be < size ", data_.size());\n PersistentTensor& t = data_[(size_t)idx];\n if(!t.IsInitialized())\n return errors::InvalidArgument("tensor at idx ", idx, " must have been set before");\n *v = t;\n return Status::OK();\n }\n\n Status set(OpKernelContext* ctx, int32 idx, const Tensor& v) {\n mutex_lock l(mu_);\n if(idx < 0)\n return errors::InvalidArgument("idx ", idx, " must be >= 0");\n if((size_t)idx >= data_.size())\n return errors::InvalidArgument("idx ", idx, " must be < size ", data_.size());\n data_[idx] = PersistentTensor(v);\n return Status::OK();\n }\n\n };\n\n // https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/resource_op_kernel.h\n class ArrayContainerCreateOp : public ResourceOpKernel<ArrayContainer> {\n public:\n explicit ArrayContainerCreateOp(OpKernelConstruction* context) : ResourceOpKernel(context) {\n OP_REQUIRES_OK(context, context->GetAttr("T", &dtype_));\n }\n\n private:\n virtual bool IsCancellable() const { return false; }\n virtual void Cancel() {}\n\n Status CreateResource(ArrayContainer** ret) override EXCLUSIVE_LOCKS_REQUIRED(mu_) {\n *ret = new ArrayContainer(dtype_);\n if(*ret == nullptr)\n return errors::ResourceExhausted("Failed to allocate");\n return Status::OK();\n }\n\n Status VerifyResource(ArrayContainer* ar) override {\n if(ar->dtype_ != dtype_)\n return errors::InvalidArgument("Data type mismatch: expected ", DataTypeString(dtype_),\n " but got ", DataTypeString(ar->dtype_), ".");\n return Status::OK();\n }\n \n DataType dtype_;\n };\n REGISTER_KERNEL_BUILDER(Name("ArrayContainerCreate").Device(DEVICE_CPU), ArrayContainerCreateOp);\n\n class ArrayContainerGetSizeOp : public OpKernel {\n public:\n using OpKernel::OpKernel;\n\n void Compute(OpKernelContext* context) override {\n ArrayContainer* ar;\n \n const Tensor* handle;\n OP_REQUIRES_OK(context, context->input("handle", &handle)); \n OP_REQUIRES_OK(context, GetResourceFromContext(context, "handle", &ar));\n core::ScopedUnref unref(ar);\n\n int32 size = ar->get_size();\n Tensor* tensor_size = nullptr;\n OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape({}), &tensor_size));\n tensor_size->flat<int32>().setConstant(size);\n }\n };\n REGISTER_KERNEL_BUILDER(Name("ArrayContainerGetSize").Device(DEVICE_CPU), ArrayContainerGetSizeOp);\n\n class ArrayContainerSetSizeOp : public OpKernel {\n public:\n using OpKernel::OpKernel;\n\n void Compute(OpKernelContext* context) override {\n ArrayContainer* ar;\n OP_REQUIRES_OK(context, GetResourceFromContext(context, "handle", &ar));\n core::ScopedUnref unref(ar);\n\n const Tensor* tensor_size;\n OP_REQUIRES_OK(context, context->input("size", &tensor_size));\n OP_REQUIRES(context, TensorShapeUtils::IsScalar(tensor_size->shape()),\n errors::InvalidArgument(\n "TensorArray index must be scalar, but had shape: ",\n tensor_size->shape().DebugString()));\n const int32 size = tensor_size->scalar<int32>()();\n OP_REQUIRES_OK(context, ar->set_size(size));\n }\n };\n REGISTER_KERNEL_BUILDER(Name("ArrayContainerSetSize").Device(DEVICE_CPU), ArrayContainerSetSizeOp);\n\n class ArrayContainerGetOp : public OpKernel {\n public:\n explicit ArrayContainerGetOp(OpKernelConstruction* context) : OpKernel(context) {\n OP_REQUIRES_OK(context, context->GetAttr("T", &dtype_));\n }\n\n void Compute(OpKernelContext* context) override {\n ArrayContainer* ar;\n OP_REQUIRES_OK(context, GetResourceFromContext(context, "handle", &ar));\n core::ScopedUnref unref(ar);\n\n const Tensor* tensor_index;\n OP_REQUIRES_OK(context, context->input("index", &tensor_index));\n OP_REQUIRES(context, TensorShapeUtils::IsScalar(tensor_index->shape()),\n errors::InvalidArgument(\n "TensorArray index must be scalar, but had shape: ",\n tensor_index->shape().DebugString()));\n const int32 index = tensor_index->scalar<int32>()();\n\n PersistentTensor value;\n OP_REQUIRES_OK(context, ar->get(context, index, &value));\n context->set_output(0, *value.AccessTensor(context));\n }\n\n private:\n DataType dtype_;\n };\n REGISTER_KERNEL_BUILDER(Name("ArrayContainerGet").Device(DEVICE_CPU), ArrayContainerGetOp);\n\n class ArrayContainerSetOp : public OpKernel {\n public:\n explicit ArrayContainerSetOp(OpKernelConstruction* context) : OpKernel(context) {\n OP_REQUIRES_OK(context, context->GetAttr("T", &dtype_));\n }\n\n void Compute(OpKernelContext* context) override {\n ArrayContainer* ar;\n OP_REQUIRES_OK(context, GetResourceFromContext(context, "handle", &ar));\n core::ScopedUnref unref(ar);\n\n const Tensor* tensor_index;\n const Tensor* tensor_value;\n OP_REQUIRES_OK(context, context->input("index", &tensor_index));\n OP_REQUIRES_OK(context, context->input("value", &tensor_value));\n \n OP_REQUIRES(context, TensorShapeUtils::IsScalar(tensor_index->shape()),\n errors::InvalidArgument(\n "index must be scalar, but had shape: ",\n tensor_index->shape().DebugString()));\n const int32 index = tensor_index->scalar<int32>()();\n OP_REQUIRES(context, tensor_value->IsInitialized(), errors::InvalidArgument("value must be initialized"));\n\n OP_REQUIRES_OK(context, ar->set(context, index, *tensor_value));\n }\n\n private:\n DataType dtype_;\n };\n REGISTER_KERNEL_BUILDER(Name("ArrayContainerSet").Device(DEVICE_CPU), ArrayContainerSetOp);\n '[source]
get_size(self)[source]
Returns:size int32 scalar
Return type:tf.Tensor
set_size(self, size)[source]
Parameters:size (tf.Tensor) –
Returns:operation
Return type:tf.Operation
get(self, index)[source]
Parameters:index (tf.Tensor) – >= 0 and < size
Returns:tensor at that index
Return type:tf.Tensor
set(self, index, value)[source]
Parameters:
  • index (tf.Tensor) – >= 0 and < size
  • value (tf.Tensor) –
Returns:

operation

Return type:

tf.Operation

class TFUtil.ExplicitRandomShuffleQueue(capacity, min_after_dequeue=0, dtypes=None, shapes=None, names=None, seed=None, shared_name=None, name='explicit_random_shuffle_queue')[source]

This is intended to behave very much like tf.RandomShuffleQueue, except that it’s implemented by other TF native ops / data structures, and you can change min_after_dequeue at runtime. This means that if you have your own logic about when to end, you can set min_after_dequeue=0 and dequeue all the remaining entries from the queue, and then later increase min_after_dequeue again. You can also start with a small min_after_dequeue and increase the number steadily. The original tf.RandomShuffleQueue had the effect of a reset min_after_dequeue=0 after you closed the queue. However, there was no way to reopen the queue. That is the whole reason this implementation exists.

One difference of this implementation is that you must call the init() op once before usage.

One way to implement this is in pure TF. We need some TF container type which supports having entries of different shapes (where the shape can differ where-ever we specified None). We also need some TF container which we can access by index. tf.TensorArray can handle that.

Another way to implement this is by multiple stateful tf.py_func which all reference this instance.

Parameters:
  • capacity (int) –
  • min_after_dequeue (int|tf.Tensor) –
  • dtypes (list[str|tf.DType]) –
  • shapes (list[tuple[int|tf.Tensor|None]]) –
  • names (list[str]|None) –
  • seed (int) –
  • shared_name (str|None) –
  • name (str) –
init(self)[source]
Return type:tf.Operation
size(self)[source]
Return type:tf.Tensor
min_after_dequeue_read(self)[source]
Return type:tf.Tensor
min_after_dequeue_assign(self, min_after_dequeue)[source]
Parameters:min_after_dequeue (tf.Tensor) –
Return type:tf.Operation
enqueue(self, v)[source]
Parameters:v (list[tf.Tensor]|dict[str,tf.Tensor]|tf.Tensor) –
Return type:tf.Operation
dequeue(self)[source]
Return type:tf.Tensor
TFUtil.mem_usage_for_dev(dev_name)[source]
Parameters:dev_name (str) – e.g. “/device:GPU:0” or “/job:localhost/replica:0/task:0/device:GPU:0”
Returns:int scalar, which is the peak memory usage in bytes of the given device
Return type:tf.Tensor

This function will not create multiple nodes in the graph for multiple calls. Currently only works for GPU devices.

TFUtil.identity_with_debug_log(x, args, out, name='DebugLogOp')[source]
Parameters:
  • x (tf.Tensor) –
  • args (dict[str,tf.Tensor|None]) –
  • out (list[dict[str,numpy.ndarray]]) –
  • name (str) –
Returns:

x

Return type:

tf.Tensor

TFUtil.add_check_numerics_ops(fetches=None, ignore_ops=None, use_check_numerics=True, debug_print_added_checks=True, name='add_check_numerics_ops')[source]

This is similar to tf.add_check_numerics_ops() and based on similar code. It adds some more logic and options.

Parameters:
  • fetches (list[tf.Operation|tf.Tensor]|None) – in case this is given, will only look at these and dependent ops
  • ignore_ops (list[str]) – e.g. “”
  • use_check_numerics (bool) – if False, instead of tf.check_numerics(), it does the check manually (via tf.is_finite()) and in case there is inf/nan, it will also print the tensor (while tf.check_numerics does not print the tensor). Note that this can be about 50 times slower.
  • debug_print_added_checks (bool) – prints info about each added check
  • name (str) – op-name for the final tf.group
Returns:

operation which performs all the checks

Return type:

tf.Operation

TFUtil.nested_get_shapes(x)[source]
Parameters:x (tf.Tensor|dict[str,tf.Tensor]|list[tf.Tensor]|object) – anything that nest supports
Returns:same structure as x, but tf.TensorShape for each tensor
TFUtil.has_control_flow_context(x)[source]
Parameters:x (tf.Tensor|tf.Operation|int|float|None|list[tf.Tensor|tf.Operation|int|float]) –
Returns:whether x has a control flow, i.e. is e.g. inside a while loop
Return type:bool
TFUtil.same_control_flow_ctx(x)[source]

Will use the same (flow) context as x. E.g. if x is a constant, it can be outside the loop, so we will yield a context which is not inside the loop. (This function was earlier called same_context.)

See also default_control_flow_ctx().

Parameters:x (tf.Tensor|tf.Operation|int|float|None|list[tf.Tensor|tf.Operation|int|float]) –
Returns:yields context (via tf.control_dependencies)
TFUtil.get_protobuf_fields(obj)[source]
Parameters:obj – protobuf object
Return type:dict[str]
TFUtil.get_op_attrib_keys(op)[source]
Parameters:op (tf.Operation|tf.Tensor|tf.TensorArray) –
Return type:list[str]
Returns:list of attribs. op.get_attr(key) should work
TFUtil.get_op_input_names(op)[source]

Also see: https://stackoverflow.com/questions/50723310/get-tensorflow-tf-operation-inputs-by-name

Parameters:op (tf.Operation) –
Returns:list of names with same len as op.inputs
Return type:list[str]
TFUtil.get_op_inputs_by_name(op)[source]
Parameters:op (tf.Operation) –
Returns:dict input_name -> input
Return type:dict[str,tf.Tensor]
TFUtil.tensor_array_is_dynamic_size(ta)[source]
Parameters:ta (tf.TensorArray) –
Return type:bool
TFUtil.tensor_array_is_clear_after_read(ta)[source]
Parameters:ta (tf.TensorArray) –
Return type:bool
TFUtil.tensor_array_element_shape(ta)[source]
Parameters:ta (tf.TensorArray) –
Return type:tf.TensorShape
TFUtil.tensor_array_like(ta, **kwargs)[source]
Parameters:
  • ta (tf.TensorArray) –
  • kwargs – passed to tf.TensorArray constructor
Returns:

another tensor array, just like ta

Return type:

tf.TensorArray

TFUtil.tensor_array_stack(ta, start=0, stop=None, name=None)[source]

Extends tf.TensorArray.stack by start/stop options.

Parameters:
  • ta (tf.TensorArray) –
  • start (int|tf.Tensor) –
  • stop (int|tf.Tensor|None) –
  • name (str) –
Return type:

tf.Tensor

This is mostly a higher-level wrapper around tf.nn.top_k().

Parameters:
  • scores (tf.Tensor) – (batch,beam_in,dim). combined scores (i.e. base beam scores + new scores), dense over the dims, such that we have labels in [0,…,dim-1].
  • beam_size (int|tf.Tensor) –
  • keep_beams (bool) – specifies that we keep the beam_in entries, i.e. we just expand, i.e. we just search on the dim. beam_size must be a multiple of beam_in.
  • cheating_gold_targets (tf.Tensor|None) – (batch,), int32
Return type:

(tf.Tensor,tf.Tensor,tf.Tensor)

Returns:

src_beams, labels, beam_scores. src_beams: (batch, beam) -> beam_in idx (int32), labels: (batch, beam) -> dim idx (int32), beam_scores: (batch, beam) -> beam score (float32).

TFUtil.select_src_beams(x, src_beams, name='select_src_beams')[source]
Parameters:
  • x (tf.Tensor|tf.TensorArray|T) – (batch * src-beam, …)
  • src_beams (tf.Tensor) – (batch, beam) -> src-beam-idx
  • name (str) –
Returns:

(batch * beam, …)

Return type:

tf.Tensor|T

TFUtil.filter_ended_scores(x, end_flags, batch_dim=None, dim=None, score_zero=0.0, score_rem=-1e+30)[source]

This can e.g. used before tf.nn.top_k to let only one beam through for an ended hypothesis. Then, batch would also include the beam size, which does not matter here.

Parameters:
  • x (tf.Tensor) – (batch, dim)
  • end_flags (tf.Tensor) – (batch,)
  • batch_dim (tf.Tensor|int|None) –
  • dim (tf.Tensor|int|None) –
  • score_zero (float) – x[…, 0] will have this score where end_flag is True
  • score_rem (float) – x[…, 1:] will have this score where end_flag is False
Returns:

filtered x, (batch, dim)

Return type:

tf.Tensor

TFUtil.to_int32_64(x)[source]
Parameters:x (tf.Tensor) – dtype uint8, int8, int16, int32, int64
Return type:tf.Tensor
Returns:dtype int32 or int64
TFUtil.to_float32(x)[source]
Parameters:x (tf.Tensor) –
Returns:x as float32
Return type:tf.Tensor
TFUtil.batch_gather(x, indices, keep_dims=False)[source]
Parameters:
  • x (tf.Tensor) – (batch,dim,…)
  • indices (tf.Tensor) – (batch,) -> [0..dim-1]
  • keep_dims (bool) –
Returns:

x[batches,indices[batches]], (batch,…). or (batch,1,…) with keep_dims

Return type:

tf.Tensor

TFUtil.unflatten_nd(x, nd_sizes, num_axes=None)[source]

E.g. assume that for each x[b], we have an image flattened, i.e. of size width*height. Then nd_sizes[b] == (width, height) would provide the individual sizes. We return y such that y[b][i][j] == x[b][i * nd_sizes[b][0] + j]. This is implemented for any number of axes. Kind of like the reverse of a ND version of flatten_with_seq_len_mask.

Parameters:
  • x (tf.Tensor) – (B, T, <Ds>)
  • nd_sizes (tf.Tensor) – (B, N = num_axes)
  • num_axes (int) –
Returns:

(B, T_1, …, T_N, <Ds>), T_i == max(nd_sizes[:, i])

Return type:

tf.Tensor

TFUtil.kernels_registered_for_op(op_name)[source]

This just wraps the TF C++ function tensorflow::KernelsRegisteredForOp().

Parameters:op_name (str) – e.g. “Gather”
Returns:e.g. [“device=’CPU’; …”, “device=’GPU’; …”, …]
Return type:list[str]
TFUtil.supported_devices_for_op(op_name)[source]
Parameters:op_name (str) –
Returns:list of devices, e.g. [“CPU”, “GPU”]
Return type:list[str]
TFUtil.find_unsupported_devices_in_graph(graph, dev_name, ignore=None)[source]
Parameters:
  • graph (tf.Graph) –
  • dev_name (str) – e.g. “GPU”
  • ignore (list[str]|None) – list of op-names to ignore, e.g. [“ScalarSummary”] etc. If None, will use defaults.
Return type:

list[tf.Operation]

TFUtil.get_device_attr(dev)[source]
Parameters:dev (str) – eg. “/device:GPU:0”, or any argument for tf.device()
Returns:scalar string, eg. b’device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1’
Return type:tf.Tensor
TFUtil.print_graph_output(fetches)[source]
Parameters:fetches (tf.Operation|tf.Tensor|list[tf.Tensor|tf.Operation]) –
TFUtil.find_ops_with_tensor_input(tensors, fetches=None, graph=None)[source]
Parameters:
  • tensors (tf.Tensor|tf.Variable|list[tf.Tensor]) –
  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None) –
  • graph (tf.Graph|None) –
Returns:

list of ops

Return type:

list[tf.Operation]

TFUtil.find_ops_path_output_to_input(tensors, fetches)[source]

Searches backwards like in tensorflow.contrib.graph_editor.get_backward_walk_ops() and then returns a found traceback, if there is one.

Parameters:
  • tensors (tf.Tensor|tf.Variable|list[tf.Tensor]) – input
  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]) – output
Returns:

list of ops, input to output

Return type:

list[tf.Operation]|None

TFUtil.get_var_update_ops(var, fetches=None)[source]
Parameters:
  • var (tf.Variable) –
  • fetches (tf.Operation|tf.Tensor|list[tf.Operation|tf.Tensor]|None) – e.g. the Optimizer.minimize() op
Returns:

list of ops that update var; currently expected to be of length 1

Return type:

list[tf.Operation]

TFUtil.get_variable_value_copy_before_update_ops(var, update_ops)[source]
Parameters:
  • var (tf.Variable) –
  • update_ops (list[tf.Operation]) –
Returns:

var value before any of the update_ops are executed

Return type:

tf.Tensor

TFUtil.get_variable_grad_from_update_ops(var, update_ops)[source]
Parameters:
Returns:

grad of loss w.r.t. var, as it is used in the update_ops, e.g. via ApplyAdam or ApplyGradientDescent (not all kind of updates are supported currently). If the gradient is sparse, it will return a tf.IndexedSlices.

Return type:

tf.Tensor|tf.IndexedSlices

TFUtil.add_control_input(op, control_input)[source]
Parameters:
  • op (tf.Operation) –
  • control_input (tf.Operation) –
TFUtil.vocab_idx_to_vocab_string(labels, vocab)[source]

Just does a lookup on vocab.

Parameters:
  • labels (tf.Tensor) – (batch,max_len), or any, int32, indices in vocab
  • vocab (tf.Tensor) – (vocab_size,), string
Returns:

(batch,max_len), or any, like labels, string

Return type:

tf.Tensor

TFUtil.vocab_idx_repr(labels, data)[source]
Parameters:
  • labels (tf.Tensor) – int32, indices in vocab
  • data (Data) – might have vocab
Returns:

string or int32, shape as labels, or maybe without last axis

Return type:

TFUtil.string_merge(strings, seq_lens, separator=' ')[source]

Also see TFEngine.Engine.search().

Parameters:
  • strings (tf.Tensor) – (batch,max_len)
  • seq_lens (tf.Tensor) – (batch,)
  • separator (str|tf.Tensor) – string
Returns:

(batch,), string

Return type:

tf.Tensor

TFUtil.string_replace(strings, old, new, count=-1)[source]

Like str.replace.

Parameters:
  • strings (tf.Tensor) – (batch,), string
  • old (tf.Tensor|str) –
  • new (tf.Tensor|str) –
  • count (tf.Tensor|int) –
Returns:

(batch,), string

Return type:

tf.Tensor

TFUtil.bpe_merge(strings)[source]
Parameters:strings (tf.Tensor) – (batch,), string
Returns:(batch,), string. strings after BPE merging
Return type:tf.Tensor
TFUtil.words_split(strings)[source]

Basically just tf.string_split with delimiter=” “.

Parameters:strings (tf.Tensor) – (batch,), string
Returns:sparse tensor of shape (batch,max_len), string
Return type:tf.SparseTensor
TFUtil.get_sparse_tensor_length(x)[source]
Parameters:x (tf.SparseTensor) – of shape prefix + (max_len,), where prefix can be anything, e.g. prefix=(batch,)
Returns:shape prefix, int64
Return type:tf.Tensor
TFUtil.string_words_calc_wer(hyps, refs)[source]
Parameters:
  • hyps (tf.Tensor) – (batch,)
  • refs (tf.Tensor) – (batch,)
Returns:

(WER (batch,), num ref words (batch,))

Return type:

(tf.Tensor, tf.Tensor)

TFUtil.py_print(pass_through_value, print_args, message=None, summarize=None, first_n=None, name='py_print')[source]

Like tf.Print(), but prints to Python stdout. Also see tf.print(), which however also does not print to Python stdout.

Parameters:
  • pass_through_value (tf.Tensor|int|float) – will return tf.identity of this, but with side effect of printing
  • print_args (list[str|tf.Tensor]) –
  • message (str|None) – A string, prefix of the error message.
  • summarize (int) – Only print this many entries of each tensor. If None, then a maximum of 3 elements are printed per input tensor.
  • first_n (int) – Only log first_n number of times. Negative numbers log always; this is the default.
  • name (str) –
Returns:

tf.identity(pass_through_value) with side effect of printing

Return type:

tf.Tensor

TFUtil.get_positional_encoding(num_channels, length=None, position=None, min_timescale=1.0, max_timescale=10000.0)[source]

Gets a bunch of sinusoids of different frequencies.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.

This allows attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to attention.

The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x) and cos(x).

In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to channels / 2. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the channels dimension.

The code is adapted from Tensor2Tensor get_timing_signal_1d (https://github.com/tensorflow/tensor2tensor).

Parameters:
  • num_channels (int) – scalar, size of timing embeddings to create. The number of different timescales is equal to channels / 2.
  • length (tf.Tensor|None) – scalar, length of timing signal sequence.
  • position (tf.Tensor|None) – could be provided directly. int32. Can have any shape.
  • min_timescale (float) – a float.
  • max_timescale (float) – a float.
Returns:

a Tensor of timing signals of shape (length, channels) or (batch, length, channels).

Return type:

tf.Tensor

TFUtil.get_non_deterministic_ops_from_graph()[source]

Lists all non deterministic ops used in the default graph If a non deterministic op is used multiple times each instance will be listed

currently doesn’t check if user specified a specific computation device list of non deterministic ops is not jet complete

Returns:list of all non deterministic ops names (depending on device and tf version) used in current graph
Return type:list[tf.Operation]
TFUtil.compute_sampled_logits(weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, subtract_log_q=True, remove_accidental_hits=False, partition_strategy='mod', name=None, seed=None)[source]

Helper function for nce_loss and sampled_softmax_loss functions. Computes sampled output training logits and labels suitable for implementing e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see sampled_softmax_loss). Note: In the case where num_true > 1, we assign to each target class the target probability 1 / num_true so that the target probabilities sum to 1 per-example.

This is a copy of
https://github.com/tensorflow/tensorflow/blob/e19c354920c3b246dda6598229210a582caaa1a9/tensorflow/python/ops/nn_impl.py#L1440
Parameters:
  • weights (tf.Tensor|list[tf.Tensor]|tuple[tf.Tensor]) – A Tensor of shape [num_classes, dim], or a list of Tensor objects whose concatenation along dimension 0 has shape [num_classes, dim]. The class embeddings.
  • biases (tf.Tensor) – A Tensor of shape [num_classes]. The class biases.
  • labels (tf.Tensor) – A Tensor of type int64 and shape [batch_size, num_true]. The target classes. Note that this format differs from the labels argument of tf.nn.softmax_cross_entropy_with_logits.
  • inputs (tf.Tensor) – A Tensor of shape [batch_size, dim]. The forward activations of the input network.
  • num_sampled (int) – The number of classes to randomly sample per batch.
  • num_classes (int) – The number of possible classes.
  • num_true (int) – The number of target classes per training example.
  • tf.Tensor, tf.Tensor)|None sampled_values ((tf.Tensor,) – a tuple of (sampled_candidates, true_expected_count, sampled_expected_count) returned by a *_candidate_sampler function. (if None, we default to log_uniform_candidate_sampler)
  • subtract_log_q (bool) – whether to subtract the log expected count of the labels in the sample to get the logits of the true labels. Default is True. Turn off for Negative Sampling.
  • remove_accidental_hits (bool) – Whether to remove “accidental hits” where a sampled class equals one of the target classes.
  • partition_strategy (str) – A string specifying the partitioning strategy, relevant if len(weights) > 1. Currently “div” and “mod” are supported. Default is “mod”. See tf.nn.embedding_lookup for more details.
  • name (str|None) – A name for the operation.
  • seed (int|None) – random seed for candidate sampling. Default to None, which doesn’t set the op-level random seed for candidate sampling.
Returns:

out_logits: Tensor object with shape

[batch_size, num_true + num_sampled], for passing to either nn.sigmoid_cross_entropy_with_logits (NCE) or nn.softmax_cross_entropy_with_logits (sampled softmax).

out_targets: A Tensor object with the same shape and dtype as out_logits.

These are the targets. If num_true > 1 the per-example labels are divided by num_true so they sum to 1.0.

Return type:

(tf.Tensor, tf.Tensor)

class TFUtil.FetchHelper(tensor, verbose_stream=None)[source]

session.run(tensor) does not work if tensor is inside a loop (tf.while_loop) (or tf.cond). You would get an error like this:

Operation '...' has been marked as not fetchable.

This class is a helper to work around that. It will add an op to the graph, which stores the most recent value. To get this executed automatically, you likely want to add is as a control dependency to another op. Use add_to_control_inputs() for that, or better copy_graph_replace_tensors(), or better copy_graph().

Parameters:
  • tensor (tf.Tensor) –
  • verbose_stream (typing.IO[str]|None) –
classmethod copy_graph(fetches, target_op, fetch_helper_tensors, stop_at_ts=(), verbose_stream=None)[source]
Parameters:
  • fetches (tf.Tensor|list[tf.Tensor]|T) –
  • target_op (tf.Operation) – will add the fetch helpers as control dependencies to this op
  • fetch_helper_tensors (list[tf.Tensor]) –
  • verbose_stream (typing.IO[str]|None) –
  • stop_at_ts (typing.Iterable[tf.Tensor]) – iterable of tensors at which the graph walk stops.
Returns:

copied fetches, fetch helpers, transformed target op

Return type:

(tf.Tensor|list[tf.Tensor]|T, list[FetchHelper], tf.Operation)

classmethod copy_graph_replace_tensors(fetches, fetch_helpers)[source]
Parameters:
  • fetches (tf.Tensor|list[tf.Tensor]) –
  • fetch_helpers (list[FetchHelper]) –
Returns:

as fetches

Return type:

tf.Tensor|list[tf.Tensor]

add_to_control_inputs(self, other_op)[source]

Note: This will not work if you already did a session.run. See here. Use copy_graph_replace_tensors() instead. Or better copy_graph().

Parameters:other_op (tf.Operation) –