`returnn.util.basic`¶

Various generic utilities, which are shared across different backend engines.

class returnn.util.basic.NotSpecified[source]¶

This is just a placeholder, to be used as default argument to mark that it is not specified.

classmethod resolve(value, default)[source]¶

Parameters:

value (T|NotSpecified|type[NotSpecified])
default (U)

Return type:

T|U

class returnn.util.basic.Entity(name: str | None = None, *, global_base: Any | None = None, global_name: str | None = None)[source]¶

This is a generic placeholder which can be used for enums or other identities. By intention it uses object.__eq__ and co, i.e. a == b iff a is b. The name is just for debugging purpose. This is more efficient than using just the string directly in an enum.

Parameters:: name (str|None)

exception returnn.util.basic.OptionalNotImplementedError[source]¶: This can optionally be implemented, but it is not required by the API.

returnn.util.basic.is_64bit_platform()[source]¶

Returns:: True if we run on 64bit, False for 32bit
Return type:: bool

https://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-is-executing-in-32bit-or-64bit-mode-on-os

class returnn.util.basic.BackendEngine[source]¶

Stores which backend engine we use in RETURNN. E.g. TensorFlow or PyTorch.

TensorFlowNetDict = 1[source]¶

TensorFlow = 2[source]¶

Torch = 3[source]¶

selected_engine: int | None = None[source]¶

classmethod select_engine(*, engine=None, default_fallback_engine=None, config=None, _select_rf_backend: bool = True)[source]¶

Parameters:

engine (int) – see the global class attribs for possible values
default_fallback_engine (int|None) – if engine is None and not defined in config, use this
config (returnn.config.Config)
_select_rf_backend – internal. avoids that Torch/TF/anything further gets imported at this point

classmethod get_selected_engine() → int[source]¶

Returns:: one of the constants TensorFlowNetDict, TensorFlow, Torch

classmethod is_tensorflow_selected()[source]¶

Return type:: bool

classmethod is_torch_selected()[source]¶

Return type:: bool

class returnn.util.basic.BehaviorVersion[source]¶

Stores the global behavior_version.

The version will be set after the config is defined at __main__.init_config() or Engine.__init__().

See behavior_version.

classmethod set(version)[source]¶

Parameters:: version (int|None)

classmethod get()[source]¶

Return type:: int

classmethod get_if_set()[source]¶

Return type:: int|None

classmethod set_min_behavior_version(min_behavior_version: int)[source]¶

There are some RETURNN features which trigger a higher min behavior version. The min behavior version is used when no behavior version is explicitly set. But it is also an error if a behavior version is set, but it is lower than the min behavior version.

Parameters:: min_behavior_version

classmethod is_set()[source]¶

Return type:: bool

exception RequirementNotSatisfied[source]¶: Behavior version requirement is not satisfied

classmethod require(condition, message, version)[source]¶

Parameters:

condition (bool)
message (str)
version (int)

reset_callbacks: List[Callable[[], None]] = [<function _behavior_version_reset_callback>][source]¶

handle_new_min_version_callbacks: List[Callable[[], None]] = [<function _behavior_version_handle_new_min_version_callback>][source]¶

returnn.util.basic.get_model_filename_postfix()[source]¶

Returns:: one possible postfix of a file which will be present when the model is saved
Return type:: str

returnn.util.basic.get_checkpoint_filepattern(filepath)[source]¶

Removes optional .index or .meta extension

Parameters:: filepath (str)
Returns:: CheckpointLoader compatible filepattern
Return type:: str

returnn.util.basic.sys_cmd_out_lines(s)[source]¶

Parameters:: s (str) – shell command
Return type:: list[str]
Returns:: all stdout split by newline. Does not cover stderr.

Raises CalledProcessError on error.

returnn.util.basic.sys_exec_out(*args, **kwargs)[source]¶

Parameters:

args (str) – for subprocess.Popen
kwargs – for subprocess.Popen

Returns:

stdout as str (assumes utf8)

Return type:

str

returnn.util.basic.sys_exec_ret_code(*args, **kwargs)[source]¶

Parameters:

args (str) – for subprocess.call
kwargs – for subprocess.call

Returns:

return code

Return type:

int

returnn.util.basic.git_commit_rev(commit='HEAD', git_dir='.', length=None)[source]¶

Parameters:

commit (str)
git_dir (str)
length (int|None)

Return type:

str

returnn.util.basic.git_is_dirty(git_dir='.')[source]¶

Parameters:: git_dir (str)
Return type:: bool

returnn.util.basic.git_commit_date(commit='HEAD', git_dir='.')[source]¶

Parameters:

commit (str)
git_dir (str)

Return type:

str

returnn.util.basic.git_describe_head_version(git_dir='.')[source]¶

Parameters:: git_dir (str)
Return type:: str

returnn.util.basic.describe_returnn_version()[source]¶

Return type:: str
Returns:: string like “1.20171017.163840+git-ab2a1da”

returnn.util.basic.describe_tensorflow_version()[source]¶

Return type:: str

returnn.util.basic.describe_torch_version() → str[source]¶

Returns:: Torch version and path info

returnn.util.basic.get_tensorflow_version_tuple() → Tuple[int, ...][source]¶

Returns:: tuple of ints, first entry is the major version

class returnn.util.basic.ReportImportedDevModules(*, description: str)[source]¶: This is supposed to be used as a context manager. We track all additionally loaded modules during this context, and also extensions to sys.path. We try to detect if such loaded module is inside a Git repository, and if so, report the Git commit.

returnn.util.basic.eval_shell_env(token)[source]¶

Parameters:: token (str)
Returns:: if “$var”, looks in os.environ, otherwise return token as is
Return type:: str

returnn.util.basic.eval_shell_str(s)[source]¶

Return type:: list[str]

Parses s as shell like arguments (via shlex.split) and evaluates shell environment variables (eval_shell_env()). s or its elements can also be callable. In those cases, they will be called and the returned value is used.

Also see expand_env_vars() or os.path.expandvars or string.Template.substitute() or shlex utils.

returnn.util.basic.expand_env_vars(s: str) → str[source]¶

Similar as os.path.expandvars():

It replaces $var or ${var} with the value of the environment variable var. Also, $$ is replaced by $. Any usage of an undefined env vars will be an error.

In addition to os.path.expandvars(), it handles $TMPDIR and $USER specially when they are not defined in os.environ, by using get_temp_dir() or get_login_username().

Also see string.Template.substitute().

Parameters:: s – string with env vars like “$TMPDIR/$USER”
Returns:: s with expanded env vars

returnn.util.basic.hdf5_dimension(filename, dimension)[source]¶

Parameters:

filename (str)
dimension (str)

Return type:

numpy.ndarray|int

returnn.util.basic.hdf5_group(filename, dimension)[source]¶

Parameters:

filename (str)
dimension (str)

Return type:

dict[str]

returnn.util.basic.hdf5_shape(filename, dimension)[source]¶

Parameters:

filename (str)
dimension

Return type:

tuple[int]

returnn.util.basic.hdf5_strings(handle, name, data)[source]¶

Parameters:

handle (h5py.File)
name (str)
data (numpy.ndarray|list[str])

returnn.util.basic.model_epoch_from_filename(filename)[source]¶

Parameters:: filename (str)
Returns:: epoch number
Return type:: int|None

returnn.util.basic.deep_update_dict_values(d, key, new_value)[source]¶

Visits all items in d. If the value is a dict, it will recursively visit it.

Parameters:

d (dict[str,T|object|None|dict]) – will update inplace
key (str)
new_value (T)

returnn.util.basic.terminal_size(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Returns the terminal size. This will probably work on linux only.

Parameters:: file (io.File)
Returns:: (columns, lines), or (-1,-1)
Return type:: (int,int)

returnn.util.basic.is_tty(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Parameters:: file (io.File)
Return type:: bool

returnn.util.basic.confirm(txt, exit_on_false=False)[source]¶

Parameters:

txt (str) – e.g. “Delete everything?”
exit_on_false (bool) – if True, will call sys.exit(1) if not confirmed

Return type:

bool

returnn.util.basic.hms(s)[source]¶

Parameters:: s (float|int) – seconds
Returns:: e.g. “1:23:45” (hs:ms:secs). see hms_fraction if you want to get fractional seconds
Return type:: str

returnn.util.basic.hms_fraction(s, decimals=4)[source]¶

Parameters:

s (float) – seconds
decimals (int) – how much decimals to print

Returns:

e.g. “1:23:45.6789” (hs:ms:secs)

Return type:

str

returnn.util.basic.human_size(n, factor=1000, frac=0.8, prec=1)[source]¶

Parameters:

n (int|float)
factor (int) – for each of the units K, M, G, T
frac (float) – when to go over to the next bigger unit
prec (int) – how much decimals after the dot

Returns:

human readable size, using K, M, G, T

Return type:

str

returnn.util.basic.human_bytes_size(n, factor=1024, frac=0.8, prec=1)[source]¶

Parameters:

n (int|float)
factor (int) – see human_size(). 1024 by default for bytes
frac (float) – see human_size()
prec (int) – how much decimals after the dot

Returns:

human readable byte size, using K, M, G, T, with “B” at the end

Return type:

str

returnn.util.basic.set_pretty_print_default_limit(limit)[source]¶

Parameters:: limit (int|float) – use float(“inf”) to disable

returnn.util.basic.set_pretty_print_as_bytes(as_bytes)[source]¶

Parameters:: as_bytes (bool)

returnn.util.basic.pretty_print(obj, limit=None)[source]¶

Parameters:

obj (object)
limit (int|float) – use float(“inf”) to disable. None will use the default, via set_pretty_print_default_limit

Returns:

repr(obj), or some shorted version of that, maybe with extra info

Return type:

str

returnn.util.basic.progress_bar(complete=1.0, prefix='', suffix='', file=None)[source]¶

Prints some progress bar.

Parameters:

complete (float) – from 0.0 to 1.0
prefix (str)
suffix (str)
file (io.TextIOWrapper|TextIO|None) – where to print. stdout by default

Returns:

nothing, will print on file

returnn.util.basic.progress_bar_with_time(complete=1.0, prefix='', **kwargs)[source]¶

progress_bar() with additional remaining time estimation.

Parameters:

complete (float)
prefix (str)
kwargs – passed to progress_bar()

Returns:

nothing

returnn.util.basic.available_physical_memory_in_bytes()[source]¶

Return type:: int

returnn.util.basic.default_cache_size_in_gbytes(factor=0.7)[source]¶

Parameters:: factor (float|int)
Return type:: int

returnn.util.basic.better_repr(o)[source]¶

The main difference to repr(): this one is deterministic. The orig dict.__repr__ has the order undefined for dict or set. For big dicts/sets/lists, add “,” at the end to make textual diffs nicer.

Parameters:: o (object)
Return type:: str

returnn.util.basic.simple_obj_repr(obj)[source]¶

Returns:: All self.__init__ args.
Return type:: str

class returnn.util.basic.ObjAsDict(obj)[source]¶

Wraps up any object as a dict, where the attributes becomes the keys. See also DictAsObj.

items()[source]¶

Returns:: vars(..).items()
Return type:: set[(str,object)]

class returnn.util.basic.DictAsObj(dikt: Dict[str, Any])[source]¶

Wraps up any dictionary as an object, where the keys becomes the attributes. See also ObjAsDict.

Parameters:: dikt

returnn.util.basic.dict_joined(*ds)[source]¶

Parameters:: ds (dict[T,V])
Returns:: all dicts joined together
Return type:: dict[T,V]

returnn.util.basic.obj_diff_str(self, other, **kwargs)[source]¶

Parameters:

self (object)
other (object)

Returns:

the difference described

Return type:

str

returnn.util.basic.obj_diff_list(self, other, **kwargs)[source]¶

Note that we recurse to a certain degree to the items, but not fully. Some differences might just be summarized.

Parameters:

self (object)
other (object)

Returns:

the difference described

Return type:

list[str]

returnn.util.basic.find_ranges(ls)[source]¶

:returns list of ranges (start,end) where end is exclusive such that the union of range(start,end) matches l. :rtype: list[(int,int)] We expect that the incoming list is sorted and strongly monotonic increasing.

returnn.util.basic.init_thread_join_hack()[source]¶: threading.Thread.join and threading.Condition.wait would block signals when run in the main thread. We never want to block signals. Here we patch away that behavior.

returnn.util.basic.start_daemon_thread(target, args=())[source]¶

Parameters:

target (()->None)
args (tuple)

Returns:

nothing

returnn.util.basic.is_quitting()[source]¶

Returns:: whether we are currently quitting (via rnn.finalize())
Return type:: bool

returnn.util.basic.interrupt_main()[source]¶

Sends KeyboardInterrupt to the main thread.

Returns:: nothing

class returnn.util.basic.AsyncThreadRun(name, func)[source]¶

Daemon thread, wrapping some function func via wrap_async_func().

Parameters:

name (str)
func (()->T)

main()[source]¶

Thread target function.

Returns:: nothing, will just set self.result

get()[source]¶

Returns:: joins the thread, and then returns the result
Return type:: T

returnn.util.basic.wrap_async_func(f)[source]¶

Calls f() and returns the result. Wrapped up with catching all exceptions, printing stack trace, and interrupt_main().

Parameters:: f (()->T)
Return type:: T

returnn.util.basic.try_run(func, args=(), *, kwargs=None, catch_exc=<class 'Exception'>, default=None)[source]¶

Parameters:

func ((()->T)|((X)->T))
args (tuple)
kwargs (dict|None)
catch_exc (type[Exception])
default (T2)

Returns:

either func() or default if there was some exception

Return type:

T|T2

returnn.util.basic.validate_broadcast_all_sources(allow_broadcast_all_sources, inputs, common)[source]¶

Call this when all inputs to some operation (layer) must be broadcasted. It checks whether broadcasting to all sources should be allowed. E.g. for input [B,T1,D1] + [B,T2,D2], when allowed, it would broadcast to [B,T1,T2,D1,D2]. When not allowed, there must be at least one source where no broadcasting will be done. Whether it is allowed, this depends on the behavior version.

https://github.com/rwth-i6/returnn/issues/691

Common usages are for get_common_shape() or Data.get_common_data().

Parameters:

allow_broadcast_all_sources (bool|NotSpecified)
inputs – anything convertible to iterable of str, used for reporting
common – anything convertible to str, used for reporting

returnn.util.basic.class_idx_seq_to_1_of_k(seq, num_classes)[source]¶

Basically one_hot.

Parameters:

seq (list[int]|np.ndarray)
num_classes (int)

Return type:

np.ndarray

returnn.util.basic.uniq(seq)[source]¶

Like Unix tool uniq. Removes repeated entries. See uniq_generic() for a generic (non-Numpy) version.

Parameters:: seq (numpy.ndarray)
Returns:: seq
Return type:: numpy.ndarray

returnn.util.basic.uniq_generic(seq)[source]¶

Like Unix tool uniq. Removes repeated entries. See uniq() for an efficient Numpy implementation. See returnn.tf.util.basic.uniq() for an efficient TF implementation.

Parameters:: seq (list[T]|tuple[T])
Returns:: seq
Return type:: list[T]

returnn.util.basic.slice_pad_zeros(x: ndarray, begin: int, end: int, axis: int = 0) → ndarray[source]¶

Parameters:

x – of shape (…, time, …)
begin
end
axis

Returns:

basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.

returnn.util.basic.random_orthogonal(shape, gain=1.0, seed=None)[source]¶

Returns a random orthogonal matrix of the given shape. Code borrowed and adapted from Keras: https://github.com/fchollet/keras/blob/master/keras/initializers.py Reference: Saxe et al., https://arxiv.org/abs/1312.6120 Related: Unitary Evolution Recurrent Neural Networks, https://arxiv.org/abs/1511.06464

Parameters:

shape (tuple[int])
gain (float)
seed (int) – for Numpy random generator

Returns:

random orthogonal matrix

Return type:

numpy.ndarray

returnn.util.basic.inplace_increment(x: ndarray, idx: ndarray, y: ndarray | float | int) → ndarray[source]¶

This basically does x[idx] += y. The difference to the Numpy version is that in case some index is there multiple times, it will only be incremented once (and it is not specified which one). See also theano.tensor.subtensor.AdvancedIncSubtensor documentation.

Parameters:

x
idx
y

returnn.util.basic.prod(ls: Iterable[T] | ndarray) → int | T | float[source]¶

Parameters:: ls
Returns:: ls[0] * ls[1] * …

returnn.util.basic.parse_orthography_into_symbols(orthography, upper_case_special=True, word_based=False, square_brackets_for_specials=True)[source]¶

For Speech. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello “) + [“[HESITATION]”] + list(” there “). with word_based == True: returns [“hello”, “[HESITATION]”, “there”]

No pre/post-processing such as: Spaces are kept as-is. No stripping at begin/end. (E.g. trailing spaces are not removed.) No tolower/toupper. Doesn’t add [BEGIN]/[END] symbols or so. Any such operations should be done explicitly in an additional function. Anything in []-brackets are meant as special-symbols. Also see parse_orthography() which includes some preprocessing.

Parameters:

orthography (str) – example: “hello [HESITATION] there “
upper_case_special (bool) – whether the special symbols are always made upper case
word_based (bool) – whether we split on space and return full words
square_brackets_for_specials (bool) – handle “[…]”

Return type:

list[str]

returnn.util.basic.parse_orthography(orthography, prefix=(), postfix=('[END]',), remove_chars='(){}', collapse_spaces=True, final_strip=True, **kwargs)[source]¶

For Speech. Full processing. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello “) + [“[HESITATION]”] + list(” there”) + [“[END]”] with word_based == True: returns [“hello”, “[HESITATION]”, “there”, “[END]”]

Does some preprocessing on orthography and then passes it on to parse_orthography_into_symbols().

Parameters:

orthography (str) – e.g. “hello [HESITATION] there “
prefix (list[str]) – will add this prefix
postfix (list[str]) – will add this postfix
remove_chars (str) – those chars will just be removed at the beginning
collapse_spaces (bool) – whether multiple spaces and tabs are collapsed into a single space
final_strip (bool) – whether we strip left and right
kwargs – passed on to parse_orthography_into_symbols()

Return type:

list[str]

returnn.util.basic.json_remove_comments(string, strip_space=True)[source]¶

Parameters:: strip_space (bool)
Return type:: str

via https://github.com/getify/JSON.minify/blob/master/minify_json.py, by Gerald Storer, Pradyun S. Gedam, modified by us.

returnn.util.basic.load_json(filename: str | None = None, content: str | None = None) → Dict[str, Any][source]¶

Parameters:

filename
content

class returnn.util.basic.NumbersDict(auto_convert=None, numbers_dict=None, broadcast_value=None)[source]¶

It’s mostly like dict[str,float|int] & some optional broadcast default value. It implements the standard math bin ops in a straight-forward way.

Parameters:

auto_convert (dict|NumbersDict|T) – first argument, so that we can automatically convert/copy
numbers_dict (dict)
broadcast_value (T)

copy() → NumbersDict[source]¶

Returns:: copy

classmethod constant_like(const_number, numbers_dict)[source]¶

Parameters:

const_number (int|float|object)
numbers_dict (NumbersDict)

Returns:

NumbersDict with same keys as numbers_dict

Return type:

NumbersDict

copy_like(numbers_dict: NumbersDict) → NumbersDict[source]¶

Parameters:: numbers_dict
Returns:: copy of self with same keys as numbers_dict as far as we have them

property keys_set: Set[str][source]¶

Also see keys_union() if you want to have a deterministic order.

Returns:: set of keys

keys_union() → List[str][source]¶

Returns:: union of keys over self and other. The order will be deterministic (unlike keys_set())

get(key: str, default=None)[source]¶

Parameters:

key
default (T)

Return type:

object|T

pop(key: str, *args)[source]¶

Parameters:

key
args (T) – default, or not

Return type:

object|T

keys() → Iterable[str][source]¶

Return type:: set[str]

values() → List[Any][source]¶

Returns:: values: dict values + self.value

items() → Iterable[Tuple[str, Any]][source]¶

Returns:: dict items. this excludes self.value

has_value_for(key: str) → bool[source]¶

Returns:: If self.value is set, always True, otherwise if key is in self.dict

has_values() → bool[source]¶

Returns:: any values in self.dict or self.value

unary_op(op)[source]¶

Parameters:: op ((T)->T2)
Returns:: new NumbersDict, where op is applied on all values
Return type:: NumbersDict

classmethod bin_op_scalar_optional(self, other, zero, op)[source]¶

Parameters:

self (T)
other (T)
zero (T)
op ((T,T)->T)

Return type:

classmethod bin_op(self, other, op, zero, result=None)[source]¶

Parameters:

self (NumbersDict|int|float|T)
other (NumbersDict|int|float|T)
op ((T,T)->T)
zero (T)
result (NumbersDict|None)

Return type:

NumbersDict

elem_eq(other, result_with_default: bool = True) → NumbersDict[source]¶

Element-wise equality check with other. Note about broadcast default value: Consider some key which is neither in self nor in other.

This means that self[key] == self.default, other[key] == other.default. Thus, in case that self.default != other.default, we get res.default == False. Then, all(res.values()) == False, even when all other values are True. This is sometimes not what we want. You can control the behavior via result_with_default.

Parameters:

other (NumbersDict|T)
result_with_default

Returns:

new NumbersDict with bool values

any_compare(other, cmp) → bool[source]¶

Parameters:

other (NumbersDict)
cmp (((object,object)->True))

classmethod max(items) → NumbersDict[source]¶

Element-wise maximum for item in items.

Parameters:: items (list[NumbersDict|int|float])

classmethod min(items) → NumbersDict[source]¶: Element-wise minimum for item in items. :param list[NumbersDict|int|float] items:

max_value()[source]¶: Maximum of our values.

min_value()[source]¶: Minimum of our values.

returnn.util.basic.collect_class_init_kwargs(cls, only_with_default=False)[source]¶

Parameters:

cls (type) – class, where it assumes that kwargs are passed on to base classes
only_with_default (bool) – if given will only return the kwargs with default values

Returns:

set if not with_default, otherwise the dict to the default values

Return type:

list[str] | dict[str]

returnn.util.basic.getargspec(func)[source]¶

inspect.getfullargspec() or inspect.getargspec (Python 2)

Parameters:: func
Returns:: FullArgSpec

returnn.util.basic.collect_mandatory_class_init_kwargs(cls)[source]¶

Parameters:: cls (type)
Returns:: list of kwargs which have no default, i.e. which must be provided
Return type:: list[str]

returnn.util.basic.help_on_type_error_wrong_args(cls, kwargs)[source]¶

Parameters:

cls (type)
kwargs (list[str])

returnn.util.basic.type_attrib_mro_chain(cls: type, attr_name: str) → list[source]¶

Returns:: list of all attributes with the given name in the MRO chain

returnn.util.basic.next_type_attrib_in_mro_chain(cls: type, attr_name: str, attr)[source]¶

Parameters:

cls
attr_name
attr – must be in the attrib MRO chain

Returns:

next attribute in the MRO chain

returnn.util.basic.custom_exec(source: str, source_filename: str, user_ns: Dict[str, Any], user_global_ns: Dict[str, Any])[source]¶

Parameters:

source
source_filename
user_ns
user_global_ns

Returns:

nothing

class returnn.util.basic.FrozenDict[source]¶: Frozen dict.

returnn.util.basic.make_hashable(obj)[source]¶

Theano needs hashable objects in some cases, e.g. the properties of Ops. This converts all objects as such, i.e. into immutable frozen types.

Parameters:: obj (T|dict|list|tuple)
Return type:: T|FrozenDict|tuple

class returnn.util.basic.RefIdEq(obj: T)[source]¶

Reference to some object (e.g. t.fTensor), but this object is always hashable, and uses the id of the function for the hash and equality.

(In case of tf.Tensor, this is for compatibility: because tf.Tensor.ref() was not available in earlier TF versions. However, we also need this for DictRefKeys.)

(This was TensorRef in earlier RETURNN versions.)

Parameters:: obj – for example tf.Tensor

class returnn.util.basic.DictRefKeys(items: None | Iterable[Tuple[K, V]] | Dict[K, V] = None, /, **kwargs)[source]¶

Like dict, but hash and equality of the keys

items() → Iterable[Tuple[K, V]][source]¶

keys() → Iterable[K][source]¶

values() → Iterable[V][source]¶

update(other: Dict[K, V] | Iterable[Tuple[K, V]], /)[source]¶

Parameters:: other – dict or iterable of (key, value) tuples

returnn.util.basic.make_dll_name(basename)[source]¶

Parameters:: basename (str)
Returns:: e.g. “lib%s.so” % basename, depending on sys.platform
Return type:: str

returnn.util.basic.escape_c_str(s)[source]¶

Parameters:: s (str)
Returns:: C-escaped str
Return type:: str

returnn.util.basic.attr_chain(base, attribs)[source]¶

Parameters:

base (object)
attribs (list[str]|tuple[str]|str)

Returns:

getattr(getattr(object, attribs[0]), attribs[1]) …

Return type:

object

returnn.util.basic.to_bool(v)[source]¶

Parameters:: v (int|float|str) – if it is a string, it should represent some integer, or alternatively “true” or “false”
Return type:: bool

returnn.util.basic.as_str(s)[source]¶

Parameters:: s (str|unicode|bytes)
Return type:: str|unicode

returnn.util.basic.unicode_to_str(s)[source]¶

The behavior is different depending on Python 2 or Python 3. In all cases, the returned type is a str object. Python 2:

We return the utf8 encoded str (which is like Python 3 bytes, or for ASCII, there is no difference).

Python 3:: We return a str object.

Note that this function probably does not make much sense. It might be used when there is other code which expects a str object, no matter if Python 2 or Python 3. In Python 2, a str object often holds UTF8 text, so the behavior of this function is fine then. Also see as_str().

Parameters:: s (str|unicode|bytes)
Return type:: str

returnn.util.basic.deepcopy(x, stop_types=None)[source]¶

Simpler variant of copy.deepcopy(). Should handle some edge cases as well, like copying module references.

Parameters:

x (T) – an arbitrary object
stop_types (list[type]|None) – objects of these types will not be deep-copied, only the reference is passed

Return type:

returnn.util.basic.read_bytes_to_new_buffer(p: BinaryIO, size: int) → BytesIO[source]¶: Read bytes from stream s into a BytesIO buffer. Raises EOFError if not enough bytes are available. Then read it via read_pickled_object().

returnn.util.basic.read_pickled_object(p: BinaryIO) → Any[source]¶

Read pickled object from stream p, after it was written via read_bytes_to_new_buffer().

Parameters:: p

returnn.util.basic.write_pickled_object(p: BinaryIO, obj: Any)[source]¶: Writes pickled object to stream p.

returnn.util.basic.serialize_object(obj: Any) → bytes[source]¶: Uses write_pickled_object().

returnn.util.basic.deserialize_object(data: bytes) → Any[source]¶: Uses read_pickled_object().

returnn.util.basic.load_txt_vector(filename)[source]¶

Expect line-based text encoding in file. We also support Sprint XML format, which has some additional xml header and footer, which we will just strip away.

Parameters:: filename (str)
Return type:: list[float]

class returnn.util.basic.CollectionReadCheckCovered(collection: Dict[str, Any], truth_value: bool | None = None)[source]¶

Wraps around a dict. It keeps track about all the keys which were read from the dict. Via assert_all_read(), you can check that there are no keys in the dict which were not read. The usage is for config dict options, where the user has specified a range of options, and where in the code there is usually a default for every non-specified option, to check whether all the user-specified options are also used (maybe the user made a typo).

Parameters:

collection
truth_value – note: check explicitly for self.truth_value, bool(self) is not the same!

classmethod from_bool_or_dict(value: bool | Dict[str, Any]) → CollectionReadCheckCovered[source]¶

Parameters:: value

get(item, default=None)[source]¶

Parameters:

item (str)
default (T)

Return type:

T|Any|None

assert_all_read()[source]¶: Asserts that all items have been read.

returnn.util.basic.which(program: str) → str | None[source]¶

Finds program in some of the dirs of the PATH env var.

Parameters:: program – e.g. “python”
Returns:: full path, e.g. “/usr/bin/python”, or None

returnn.util.basic.which_pip()[source]¶

Return type:: str
Returns:: path to pip for the current Python env

returnn.util.basic.pip_install(*pkg_names)[source]¶

Install packages via pip for the current Python env.

Parameters:: pkg_names (str)

returnn.util.basic.pip_check_is_installed(pkg_name)[source]¶

Parameters:: pkg_name (str) – without version, e.g. just “tensorflow”, or with version, e.g. “tensorflow==1.2.3”
Return type:: bool

returnn.util.basic.overwrite_os_exec(prefix_args)[source]¶

Parameters:: prefix_args (list[str])

returnn.util.basic.get_lsb_release()[source]¶

Returns:: /etc/lsb-release parsed as a dict
Return type:: dict[str,str]

returnn.util.basic.get_ubuntu_major_version()[source]¶

Return type:: int|None

returnn.util.basic.auto_prefix_os_exec_prefix_ubuntu(prefix_args, ubuntu_min_version=16)[source]¶

Parameters:

prefix_args (list[str])
ubuntu_min_version (int)

Example usage:: auto_prefix_os_exec_prefix_ubuntu([“/u/zeyer/tools/glibc217/ld-linux-x86-64.so.2”])

returnn.util.basic.cleanup_env_var_path(env_var, path_prefix)[source]¶

Parameters:

env_var (str) – e.g. “LD_LIBRARY_PATH”
path_prefix (str)

Will remove all paths in os.environ[env_var] which are prefixed with path_prefix.

returnn.util.basic.get_login_username()[source]¶

Return type:: str
Returns:: the username of the current user.

Use this as a replacement for os.getlogin().

returnn.util.basic.get_temp_dir(*, with_username: bool = True) → str[source]¶

Similar as tempfile.gettempdir() but prefers /var/tmp over /tmp.

Parameters:: with_username – whether to append the username to the path
Returns:: e.g. “/var/tmp/$USERNAME”

returnn.util.basic.get_cache_dir()[source]¶

Returns:: used to cache non-critical things. by default get_temp_dir. unless you define env RETURNN_CACHE_DIR
Return type:: str

class returnn.util.basic.LockFile(directory, name='lock_file', lock_timeout=3600)[source]¶

Simple lock file.

Parameters:

directory (str)
lock_timeout (int|float) – in seconds

is_old_lockfile()[source]¶

Returns:: Whether there is an existing lock file and the existing lock file is old.
Return type:: bool

maybe_remove_old_lockfile()[source]¶: Removes an existing old lockfile, if there is one.

is_locked()[source]¶

Returns:: whether there is an active (not old) lockfile
Return type:: bool

lock()[source]¶: Acquires the lock.

try_lock() → bool[source]¶

Tries to acquire the lock.

Returns:: whether the lock was acquired

unlock()[source]¶: Releases the lock.

returnn.util.basic.touch_file(filename: str, *, mode: int = 438)[source]¶

If file does not exist, creates it, otherwise updates its mtime.

Parameters:

filename
mode – if it does not exist, use given file permission mode

returnn.util.basic.str_is_number(s)[source]¶

Parameters:: s (str) – e.g. “1”, “.3” or “x”
Returns:: whether s can be casted to float or int
Return type:: bool

returnn.util.basic.sorted_values_from_dict(d)[source]¶

Parameters:: d (dict[T,V])
Return type:: list[V]

returnn.util.basic.dict_zip(keys, values)[source]¶

Parameters:

keys (list[T])
values (list[V])

Return type:

dict[T,V]

returnn.util.basic.parse_ld_conf_file(fn)[source]¶

Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py.

Parameters:: fn (str) – e.g. “/etc/ld.so.conf”
Returns:: list of paths for libs
Return type:: list[str]

returnn.util.basic.get_ld_paths()[source]¶

To be very correct, see man-page of ld.so. And here: https://unix.stackexchange.com/questions/354295/what-is-the-default-value-of-ld-library-path/354296 Short version, not specific to an executable, in this order: - LD_LIBRARY_PATH - /etc/ld.so.cache (instead we will parse /etc/ld.so.conf) - /lib, /usr/lib (or maybe /lib64, /usr/lib64) Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py.

Return type:: list[str]
Returns:: list of paths to search for libs (*.so files)

returnn.util.basic.find_lib(lib_name)[source]¶

Parameters:: lib_name (str) – without postfix/prefix, e.g. “cudart” or “blas”
Returns:: returns full path to lib or None
Return type:: str|None

returnn.util.basic.read_sge_num_procs(job_id=None)[source]¶

From the Sun Grid Engine (SGE), reads the num_proc setting for a particular job. If job_id is not provided and the JOB_ID env is set, it will use that instead (i.e. it uses the current job). This calls qstat to figure out this setting. There are multiple ways this can go wrong, so better catch any exception.

Parameters:: job_id (int|None)
Returns:: num_proc
Return type:: int|None

returnn.util.basic.get_number_available_cpus()[source]¶

Returns:: number of available CPUs, if we can figure it out
Return type:: int|None

returnn.util.basic.guess_requested_max_num_threads(log_file=None, fallback_num_cpus=True)[source]¶

Parameters:

log_file (io.File)
fallback_num_cpus (bool)

Return type:

int|None

returnn.util.basic.get_cpu_model_name() → str[source]¶

Returns:: e.g. “Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz” via /proc/cpuinfo. falls back to platform.processor().

returnn.util.basic.get_gpu_names()[source]¶

Return type:: list[str]

returnn.util.basic.get_num_gpu_devices()[source]¶

Returns:: (cpu count, gpu count)
Return type:: (int, int)

returnn.util.basic.have_gpu()[source]¶

Return type:: bool

returnn.util.basic.try_and_ignore_exception(f)[source]¶

Calls f, and ignores any exception.

Parameters:: f (()->T)
Returns:: whatever f returns, or None
Return type:: T|None

returnn.util.basic.try_get_stack_frame(depth=1)[source]¶

Parameters:: depth (int)
Return type:: types.FrameType|None
Returns:: caller function name. this is just for debugging

returnn.util.basic.try_get_caller_name(depth=1, fallback=None)[source]¶

Parameters:

depth (int)
fallback (str|None) – this is returned if we fail for some reason

Return type:

str|None

Returns:

caller function name. this is just for debugging

returnn.util.basic.traceback_clear_frames(tb)[source]¶

Clear traceback frame locals.

Just like traceback.clear_frames(), but has an additional fix (https://github.com/python/cpython/issues/113939).

Parameters:: tb (types.TracebackType)

exception returnn.util.basic.InfiniteRecursionDetected[source]¶: Raised when an infinite recursion is detected, by guard_infinite_recursion.

returnn.util.basic.guard_infinite_recursion(*args)[source]¶

Registers args (could be func + args) in some cache. If those args are already in the cache, it will raise an exception.

It will use the id of the args as key and not use any hashing to allow that guard_infinite_recursion can be used to guard custom __hash__ implementations as well.

returnn.util.basic.camel_case_to_snake_case(name)[source]¶

Parameters:: name (str) – e.g. “CamelCase”
Returns:: e.g. “camel_case”
Return type:: str

returnn.util.basic.get_hostname()[source]¶

Returns:: e.g. “cluster-cn-211”
Return type:: str

returnn.util.basic.is_running_on_cluster()[source]¶

Returns:: i6 / Slurm specific. Whether we run on some of the cluster nodes.
Return type:: bool

returnn.util.basic.get_utc_start_time_filename_part()[source]¶

Returns:: string which can be used as part of a filename, which represents the start time of RETURNN in UTC
Return type:: str

returnn.util.basic.maybe_make_dirs(dirname)[source]¶

Creates the directory if it does not yet exist.

Parameters:: dirname (str) – The path of the directory

returnn.util.basic.log_runtime_info_to_dir(path, config)[source]¶

This will write multiple logging information into the path. It will create returnn.*.log with some meta information, as well as copy the used config file.

Parameters:

path (str) – directory path
config (returnn.config.Config)

returnn.util.basic.should_write_to_disk(config)[source]¶

Parameters:: config (returnn.config.Config)
Return type:: bool

returnn.util.basic.get_global_inf_value() → float[source]¶

Returns:: float(“inf”) by default, but tries to read inf_value from the global config

returnn.util.basic.is_onnx_export_global() → bool[source]¶

Returns:: False by default. If ‘onnx_export’ is set in the config, that value is used.

returnn.util.basic.get_patch_atfork_lib()[source]¶

Returns:: path to our patch_atfork lib. see maybe_restart_returnn_with_atfork_patch()
Return type:: str

returnn.util.basic.restart_returnn()[source]¶: Restarts RETURNN.

returnn.util.basic.maybe_restart_returnn_with_atfork_patch()[source]¶

What we want: subprocess.Popen to always work. Problem: It uses fork+exec internally in subprocess_fork_exec, via _posixsubprocess.fork_exec. That is a problem because fork can trigger any atfork handlers registered via pthread_atfork, and those can crash/deadlock in some cases.

https://github.com/tensorflow/tensorflow/issues/13802 https://github.com/xianyi/OpenBLAS/issues/240 https://trac.sagemath.org/ticket/22021 https://bugs.python.org/issue31814 https://stackoverflow.com/questions/46845496/ld-preload-and-linkage https://stackoverflow.com/questions/46810597/forkexec-without-atfork-handlers

The solution here: Just override pthread_atfork, via LD_PRELOAD. Note that in some cases, this is not enough (see the SO discussion), so we also overwrite fork itself. See also tests/test_fork_exec.py for a demo.

returnn.util.basic.close_all_fds_except(except_fds)[source]¶

Calls os.closerange except for the given fds. Code adopted and extended from multiprocessing.util.close_all_fds_except.

Parameters:: except_fds (Collection[int]) – usually at least {0,1,2}

returnn.util.basic.is_valid_fd(fd: int) → bool[source]¶

Returns:: whether the file descriptor (fd) is still open and valid

class returnn.util.basic.Stats(*, format_str=None)[source]¶

Collects mean and variance, running average.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

Parameters:: format_str (None|((float|numpy.ndarray)->str)) – used for __str__ and logging. str() by default. Could be e.g. human_bytes_size() for bytes.

collect(data)[source]¶

Parameters:: data (numpy.ndarray|list[int]|list[float]) – shape (time, dim) or (time,)

get_mean()[source]¶

Returns:: mean, shape (dim,)
Return type:: numpy.ndarray

get_std_dev()[source]¶

Returns:: std dev, shape (dim,)
Return type:: numpy.ndarray

dump(output_file_prefix=None, stream=None, stream_prefix='')[source]¶

Parameters:

output_file_prefix (str|None) – if given, will numpy.savetxt mean|std_dev to disk
stream_prefix (str)
stream (io.TextIOBase) – sys.stdout by default

returnn.util.basic.is_namedtuple(cls)[source]¶

Parameters:: cls (T) – tuple, list or namedtuple type
Returns:: whether cls is a namedtuple type
Return type:: bool

returnn.util.basic.make_seq_of_type(cls, seq)[source]¶

Parameters:

cls (type[T]) – e.g. tuple, list or namedtuple
seq (list|tuple|T)

Returns:

cls(seq) or cls(*seq)

Return type:

T|list|tuple

returnn.util.basic.ensure_list_of_type(ls, type_)[source]¶

Parameters:

ls (list)
type ((()->T)|type[T]) – type of instances of ls. Note the strange type here in the docstring is due to some PyCharm type inference problems (https://youtrack.jetbrains.com/issue/PY-50828).

Return type:

list[T]

returnn.util.basic.compute_bleu(reference_corpus, translation_corpus, max_order=4, use_bp=True)[source]¶

Computes BLEU score of translated segments against one or more references. Code adapted from Google Tensor2Tensor.

Args:

reference_corpus (list[list[int]|list[str]]): list of references for each translation. Each: reference should be tokenized into a list of tokens.
translation_corpus (list[list[int]|list[str]]): list of translations to score. Each translation: should be tokenized into a list of tokens.

max_order (int): Maximum n-gram order to use when computing BLEU score. use_bp (bool): boolean, whether to apply brevity penalty.

Returns:

BLEU score.

returnn.util.basic.monkeyfix_glib()[source]¶: Fixes some stupid bugs such that SIGINT is not working. This is used by audioread, and indirectly by librosa for loading audio. https://stackoverflow.com/questions/16410852/ See also monkeypatch_audioread().

returnn.util.basic.monkeypatch_audioread()[source]¶

audioread does not behave optimal in some cases. E.g. each call to _ca_available() takes quite long because of the ctypes.util.find_library usage. We will patch this.

However, the recommendation would be to not use audioread (librosa.load). audioread uses Gstreamer as a backend by default currently (on Linux). Gstreamer has multiple issues. See also monkeyfix_glib(), and here for discussion: https://github.com/beetbox/audioread/issues/62 https://github.com/beetbox/audioread/issues/63

Instead, use PySoundFile, which is also faster. See here for discussions: https://github.com/beetbox/audioread/issues/64 https://github.com/librosa/librosa/issues/681

returnn.util.basic.cf(filename)[source]¶

Cache manager. i6 specific.

Returns:: filename
Return type:: str

returnn.util.basic.binary_search_any(cmp, low, high)[source]¶

Binary search for a custom compare function.

Parameters:

cmp ((int)->int) – e.g. cmp(idx) == compare(array[idx], key)
low (int) – inclusive
high (int) – exclusive

Return type:

int|None

returnn.util.basic.generic_import_module(filename)[source]¶

Parameters:: filename (str) – We try to be clever about filename. If it looks like a module name, just do importlib.import_module. If it looks like a filename, search for a base path (which does not have __init__.py), add that path to sys.path if needed, and import the remaining where “/” is replaced by “.” and the file extension is removed.
Returns:: the module
Return type:: types.ModuleType

returnn.util.basic.softmax(x, axis=None)[source]¶

Parameters:

x (numpy.ndarray)
axis (int|None)

Return type:

numpy.ndarray

returnn.util.basic.collect_proc_maps_exec_files()[source]¶

Currently only works on Linux…

Returns:: list of mapped executables (libs)
Return type:: list[str]

returnn.util.basic.find_sym_in_exec(fn, sym)[source]¶

Uses objdump to list available symbols, and filters them by the given sym.

Parameters:

fn (str) – path
sym (str)

Returns:

matched out, or None

Return type:

str|None

returnn.util.basic.dummy_numpy_gemm_call()[source]¶: Just performs some GEMM call via Numpy. This makes sure that the BLAS library is loaded.

returnn.util.basic.find_sgemm_libs_from_runtime()[source]¶

Looks through all libs via collect_proc_maps_exec_files(), and searches for all which have the sgemm symbol. Currently only works on Linux (because collect_proc_maps_exec_files).

Returns:: list of libs (their path)
Return type:: list[str]

returnn.util.basic.find_libcudart_from_runtime()[source]¶

Looks through all libs via collect_proc_maps_exec_files(), and searches for libcudart. Currently only works on Linux (because collect_proc_maps_exec_files).

Returns:: list of libs (their path)
Return type:: str|None

returnn.util.basic.override_env_var(var_name: str, value: str)[source]¶

context manager for temporarily overriding the value of an env var

Parameters:

var_name – the name of the environment variable to override
value – the value to set while the context mgr is active

returnn.util.basic.get_fwd_compat_kwargs() → Dict[str, Any][source]¶: Get randomly named kwargs for ensuring forwards compatibility in user code.

returnn.util.basic.slurm_time_left_sec() → int | None[source]¶

Query the remaining wallclock budget of the current SLURM job allocation.

Returns:: remaining seconds, or None if not running under SLURM (SLURM_JOB_ID env var missing) or if the squeue query fails for any reason.

returnn.util.basic¶

`returnn.util.basic`¶