Util

Various generic utilities, which are shared across different backend engines.

class Util.NotSpecified[source]

This is just a placeholder, to be used as default argument to mark that it is not specified.

classmethod resolve(value, default)[source]
Parameters:
  • value (T|NotSpecified|type[NotSpecified]) –
  • default (U) –
Return type:

T|U

exception Util.OptionalNotImplementedError[source]

This can optionally be implemented, but it is not required by the API.

Util.is_64bit_platform()[source]
Returns:True if we run on 64bit, False for 32bit
Return type:bool

http://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-is-executing-in-32bit-or-64bit-mode-on-os

class Util.BackendEngine[source]

Stores which backend engine we use in RETURNN. E.g. Theano or TensorFlow.

Theano = 0[source]
TensorFlow = 1[source]
selectedEngine = None[source]
classmethod select_engine(engine=None, config=None)[source]
Parameters:
  • engine (int) – see the global class attribs for possible values
  • config (Config.Config) –
classmethod get_selected_engine()[source]
Return type:int
classmethod is_theano_selected()[source]
Return type:bool
classmethod is_tensorflow_selected()[source]
Return type:bool
Util.get_model_filename_postfix()[source]
Returns:one possible postfix of a file which will be present when the model is saved
Return type:str
Util.cmd(s)[source]
Return type:list[str]

:returns all stdout splitted by newline. Does not cover stderr. Raises CalledProcessError on error.

Util.sysexec_out(*args, **kwargs)[source]
Parameters:
  • args – for subprocess.Popen
  • kwargs – for subprocess.Popen
Returns:

stdout as str (assumes utf8)

Return type:

str

Util.sysexec_ret_code(*args, **kwargs)[source]
Parameters:
  • args (str) – for subprocess.call
  • kwargs – for subprocess.call
Returns:

return code

Return type:

int

Util.git_commit_rev(commit='HEAD', gitdir='.')[source]
Parameters:
  • commit (str) –
  • gitdir (str) –
Return type:

str

Util.git_is_dirty(gitdir='.')[source]
Parameters:gitdir (str) –
Return type:bool
Util.git_commit_date(commit='HEAD', gitdir='.')[source]
Parameters:
  • commit (str) –
  • gitdir (str) –
Return type:

str

Util.git_describe_head_version(gitdir='.')[source]
Parameters:gitdir (str) –
Return type:str
Util.describe_returnn_version()[source]
Return type:str
Returns:string like “20171017.163840–git-ab2a1da”, via git_describeHeadVersion()
Util.describe_theano_version()[source]
Return type:str
Util.describe_tensorflow_version()[source]
Return type:str
Util.get_tensorflow_version_tuple()[source]
Returns:tuple of ints, first entry is the major version
Return type:tuple[int]
Util.eval_shell_env(token)[source]
Parameters:token (str) –
Returns:if “$var”, looks in os.environ, otherwise return token as is
Return type:str
Util.eval_shell_str(s)[source]
Return type:list[str]

Parses s as shell like arguments (via shlex.split) and evaluates shell environment variables (eval_shell_env()). s or its elements can also be callable. In those cases, they will be called and the returned value is used.

Util.hdf5_dimension(filename, dimension)[source]
Parameters:
  • filename (str) –
  • dimension (str) –
Return type:

numpy.ndarray|int

Util.hdf5_group(filename, dimension)[source]
Parameters:
  • filename (str) –
  • dimension (str) –
Return type:

dict[str]

Util.hdf5_shape(filename, dimension)[source]
Parameters:
  • filename (str) –
  • dimension
Return type:

tuple[int]

Util.hdf5_strings(handle, name, data)[source]
Parameters:
  • handle (h5py.File) –
  • name (str) –
  • data (numpy.ndarray) –
Util.model_epoch_from_filename(filename)[source]
Parameters:filename (str) –
Returns:epoch number
Return type:int
Util.deep_update_dict_values(d, key, new_value)[source]

Visits all items in d. If the value is a dict, it will recursively visit it.

Parameters:
  • d (dict[str,T|object|None|dict]) – will update inplace
  • key (str) –
  • new_value (T) –
Util.terminal_size(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Returns the terminal size. This will probably work on linux only.

Parameters:file (io.File) –
Returns:(columns, lines), or (-1,-1)
Return type:(int,int)
Util.is_tty(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]
Parameters:file (io.File) –
Return type:bool
Util.confirm(txt, exit_on_false=False)[source]
Parameters:
  • txt (str) – e.g. “Delete everything?”
  • exit_on_false (bool) – if True, will call sys.exit(1) if not confirmed
Return type:

bool

Util.hms(s)[source]
Parameters:s (float|int) – seconds
Returns:e.g. “1:23:45” (hs:ms:secs). see hms_fraction if you want to get fractional seconds
Return type:str
Util.hms_fraction(s, decimals=4)[source]
Parameters:
  • s (float) – seconds
  • decimals (int) – how much decimals to print
Returns:

e.g. “1:23:45.6789” (hs:ms:secs)

Return type:

str

Util.human_size(n, factor=1000, frac=0.8, prec=1)[source]
Parameters:
  • n (int|float) –
  • factor (int) – for each of the units K, M, G, T
  • frac (float) – when to go over to the next bigger unit
  • prec (int) – how much decimals after the dot
Returns:

human readable size, using K, M, G, T

Return type:

str

Util.human_bytes_size(n, factor=1024, frac=0.8, prec=1)[source]
Parameters:
  • n (int|float) –
  • factor (int) – see human_size(). 1024 by default for bytes
  • frac (float) – see human_size()
  • prec (int) – how much decimals after the dot
Returns:

human readable byte size, using K, M, G, T, with “B” at the end

Return type:

str

Util.set_pretty_print_default_limit(limit)[source]
Parameters:limit (int|float) – use float(“inf”) to disable
Util.set_pretty_print_as_bytes(as_bytes)[source]
Parameters:as_bytes (bool) –
Util.pretty_print(obj, limit=None)[source]
Parameters:
  • obj (object) –
  • limit (int|float) – use float(“inf”) to disable. None will use the default, via set_pretty_print_default_limit
Returns:

repr(obj), or some shorted version of that, maybe with extra info

Return type:

str

Util.progress_bar(complete=1.0, prefix='', suffix='', file=None)[source]

Prints some progress bar.

Parameters:
  • complete (float) – from 0.0 to 1.0
  • prefix (str) –
  • suffix (str) –
  • file (io.TextIOWrapper|None) – where to print. stdout by default
Returns:

nothing, will print on file

Util.progress_bar_with_time(complete=1.0, prefix='', **kwargs)[source]

progress_bar() with additional remaining time estimation.

Parameters:
  • complete (float) –
  • prefix (str) –
  • kwargs – passed to progress_bar()
Returns:

nothing

Util.available_physical_memory_in_bytes()[source]
Return type:int
Util.default_cache_size_in_gbytes(factor=0.7)[source]
Parameters:factor (float|int) –
Return type:int
Util.better_repr(o)[source]

The main difference to repr(): this one is deterministic. The orig dict.__repr__ has the order undefined for dict or set. For big dicts/sets/lists, add “,” at the end to make textual diffs nicer.

Parameters:o (object) –
Return type:str
Util.simple_obj_repr(obj)[source]
Returns:All self.__init__ args.
Return type:str
class Util.ObjAsDict(obj)[source]

Wraps up any object as a dict, where the attributes becomes the keys. See also DictAsObj.

items(self)[source]
Returns:vars(..).items()
Return type:set[(str,object)]
class Util.DictAsObj(dikt)[source]

Wraps up any dictionary as an object, where the keys becomes the attributes. See also ObjAsDict.

Parameters:dikt (dict[str]) –
Util.dict_joined(*ds)[source]
Parameters:ds (dict[T,V]) –
Returns:all dicts joined together
Return type:dict[T,V]
Util.obj_diff_str(self, other)[source]
Parameters:
  • self (object) –
  • other (object) –
Returns:

the difference described

Return type:

str

Util.dict_diff_str(self, other)[source]
Parameters:
  • self (dict) –
  • other (dict) –
Returns:

the difference described

Return type:

str

Util.find_ranges(l)[source]

:returns list of ranges (start,end) where end is exclusive such that the union of range(start,end) matches l. :rtype: list[(int,int)] We expect that the incoming list is sorted and strongly monotonic increasing.

Util.init_thread_join_hack()[source]

threading.Thread.join and threading.Condition.wait would block signals when run in the main thread. We never want to block signals. Here we patch away that behavior.

Util.start_daemon_thread(target, args=())[source]
Parameters:
  • target (()->None) –
  • args (tuple) –
Returns:

nothing

Util.is_quitting()[source]
Returns:whether we are currently quitting (via rnn.finalize())
Return type:bool
Util.interrupt_main()[source]

Sends KeyboardInterrupt to the main thread.

Returns:nothing
class Util.AsyncThreadRun(name, func)[source]

Daemon thread, wrapping some function func via wrap_async_func().

Parameters:
  • name (str) –
  • func (()->T) –
main(self)[source]

Thread target function.

Returns:nothing, will just set self.result
get(self)[source]
Returns:joins the thread, and then returns the result
Return type:T
Util.wrap_async_func(f)[source]

Calls f() and returns the result. Wrapped up with catching all exceptions, printing stack trace, and interrupt_main().

Parameters:f (()->T) –
Return type:T
Util.try_run(func, args=(), catch_exc=<class 'Exception'>, default=None)[source]
Parameters:
  • func (((X)->T)) –
  • args (tuple) –
  • catch_exc (type[Exception]) –
  • default (T2) –
Returns:

either func() or default if there was some exception

Return type:

T|T2

Util.class_idx_seq_to_1_of_k(seq, num_classes)[source]

Basically one_hot.

Parameters:
  • seq (list[int]|np.ndarray) –
  • num_classes (int) –
Return type:

np.ndarray

Util.uniq(seq)[source]

Like Unix tool uniq. Removes repeated entries.

Parameters:seq – numpy.array
Returns:seq
Util.slice_pad_zeros(x, begin, end, axis=0)[source]
Parameters:
  • x (numpy.ndarray) – of shape (…, time, …)
  • begin (int) –
  • end (int) –
  • axis (int) –
Returns:

basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.

Return type:

numpy.ndarray

Util.random_orthogonal(shape, gain=1.0, seed=None)[source]

Returns a random orthogonal matrix of the given shape. Code borrowed and adapted from Keras: https://github.com/fchollet/keras/blob/master/keras/initializers.py Reference: Saxe et al., http://arxiv.org/abs/1312.6120 Related: Unitary Evolution Recurrent Neural Networks, https://arxiv.org/abs/1511.06464

Parameters:
  • shape (tuple[int]) –
  • gain (float) –
  • seed (int) – for Numpy random generator
Returns:

random orthogonal matrix

Return type:

numpy.ndarray

Util.inplace_increment(x, idx, y)[source]

This basically does x[idx] += y. The difference to the Numpy version is that in case some index is there multiple times, it will only be incremented once (and it is not specified which one). See also theano.tensor.subtensor.AdvancedIncSubtensor documentation.

Util.prod(ls)[source]
Parameters:ls (list[T]|tuple[T]|numpy.ndarray) –
Return type:T|int|float
Util.parse_orthography_into_symbols(orthography, upper_case_special=True, word_based=False, square_brackets_for_specials=True)[source]

For Speech. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello “) + [“[HESITATION]”] + list(” there “). with word_based == True: returns [“hello”, “[HESITATION]”, “there”]

No pre/post-processing such as: Spaces are kept as-is. No stripping at begin/end. (E.g. trailing spaces are not removed.) No tolower/toupper. Doesn’t add [BEGIN]/[END] symbols or so. Any such operations should be done explicitly in an additional function. Anything in []-brackets are meant as special-symbols. Also see parse_orthography() which includes some preprocessing.

Parameters:
  • orthography (str) – example: “hello [HESITATION] there “
  • upper_case_special (bool) – whether the special symbols are always made upper case
  • word_based (bool) – whether we split on space and return full words
  • square_brackets_for_specials (bool) – handle “[…]”
Return type:

list[str]

Util.parse_orthography(orthography, prefix=(), postfix=('[END]', ), remove_chars='(){}', collapse_spaces=True, final_strip=True, **kwargs)[source]

For Speech. Full processing. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello “) + [“[HESITATION]”] + list(” there”) + [“[END]”] with word_based == True: returns [“hello”, “[HESITATION]”, “there”, “[END]”]

Does some preprocessing on orthography and then passes it on to parse_orthography_into_symbols().

Parameters:
  • orthography (str) – e.g. “hello [HESITATION] there “
  • prefix (list[str]) – will add this prefix
  • postfix (list[str]) – will add this postfix
  • remove_chars (str) – those chars will just be removed at the beginning
  • collapse_spaces (bool) – whether multiple spaces and tabs are collapsed into a single space
  • final_strip (bool) – whether we strip left and right
  • kwargs – passed on to parse_orthography_into_symbols()
Return type:

list[str]

Util.json_remove_comments(string, strip_space=True)[source]
Parameters:strip_space (bool) –
Return type:str

via https://github.com/getify/JSON.minify/blob/master/minify_json.py, by Gerald Storer, Pradyun S. Gedam, modified by us.

Util.load_json(filename=None, content=None)[source]
Parameters:
  • filename (str|None) –
  • content (str|None) –
Return type:

dict[str]

class Util.NumbersDict(auto_convert=None, numbers_dict=None, broadcast_value=None)[source]

It’s mostly like dict[str,float|int] & some optional broadcast default value. It implements the standard math bin ops in a straight-forward way.

Parameters:
  • auto_convert (dict|NumbersDict|T) – first argument, so that we can automatically convert/copy
  • numbers_dict (dict) –
  • broadcast_value (T) –
copy(self)[source]
Return type:NumbersDict
classmethod constant_like(const_number, numbers_dict)[source]
Parameters:
  • const_number (int|float|object) –
  • numbers_dict (NumbersDict) –
Returns:

NumbersDict with same keys as numbers_dict

Return type:

NumbersDict

copy_like(self, numbers_dict)[source]
Parameters:numbers_dict (NumbersDict) –
Returns:copy of self with same keys as numbers_dict as far as we have them
Return type:NumbersDict
keys_set[source]
Return type:set[str]
get(self, key, default=None)[source]
Parameters:
  • key (str) –
  • default (T) –
Return type:

object|T

pop(self, key, *args)[source]
Parameters:
  • key (str) –
  • args (T) – default, or not
Return type:

object|T

keys(self)[source]
Return type:set[str]
values(self)[source]
Return type:list[object]
items(self)[source]
Returns:dict items. this excludes self.value
Return type:str[(str,object)]
has_values(self)[source]
Return type:bool
unary_op(self, op)[source]
Parameters:op ((T)->T2) –
Returns:new NumbersDict, where op is applied on all values
Return type:NumbersDict
classmethod bin_op_scalar_optional(self, other, zero, op)[source]
Parameters:
  • self (T) –
  • other (T) –
  • zero (T) –
  • op ((T,T)->T) –
Return type:

T

classmethod bin_op(self, other, op, zero, result=None)[source]
Parameters:
  • self (NumbersDict|int|float|T) –
  • other (NumbersDict|int|float|T) –
  • op ((T,T)->T) –
  • zero (T) –
  • result (NumbersDict|None) –
Return type:

NumbersDict

elem_eq(self, other, result_with_default=True)[source]

Element-wise equality check with other. Note about broadcast default value: Consider some key which is neither in self nor in other.

This means that self[key] == self.default, other[key] == other.default. Thus, in case that self.default != other.default, we get res.default == False. Then, all(res.values()) == False, even when all other values are True. This is sometimes not what we want. You can control the behavior via result_with_default.
Parameters:
  • other (NumbersDict|T) –
  • result_with_default (bool) –
Return type:

NumbersDict

any_compare(self, other, cmp)[source]
Parameters:
  • other (NumbersDict) –
  • cmp (((object,object)->True)) –
Return type:

True

classmethod max(items)[source]

Element-wise maximum for item in items. :param list[NumbersDict|int|float] items: :rtype: NumbersDict

classmethod min(items)[source]

Element-wise minimum for item in items. :param list[NumbersDict|int|float] items: :rtype: NumbersDict

max_value(self)[source]

Maximum of our values.

min_value(self)[source]

Minimum of our values.

Util.collect_class_init_kwargs(cls, only_with_default=False)[source]
Parameters:
  • cls (type) – class, where it assumes that kwargs are passed on to base classes
  • only_with_default (bool) – if given will only return the kwargs with default values
Returns:

set if not with_default, otherwise the dict to the default values

Return type:

list[str] | dict[str]

Util.getargspec(func)[source]

inspect.getfullargspec() or inspect.getargspec (Python 2)

Parameters:func
Returns:FullArgSpec
Util.collect_mandatory_class_init_kwargs(cls)[source]
Parameters:cls (type) –
Returns:list of kwargs which have no default, i.e. which must be provided
Return type:list[str]
Util.help_on_type_error_wrong_args(cls, kwargs)[source]
Parameters:
  • cls (type) –
  • kwargs (list[str]) –
Util.custom_exec(source, source_filename, user_ns, user_global_ns)[source]
Parameters:
  • source (str) –
  • source_filename (str) –
  • user_ns (dict[str]) –
  • user_global_ns (dict[str]) –
Returns:

nothing

class Util.FrozenDict[source]

Frozen dict.

Util.make_hashable(obj)[source]

Theano needs hashable objects in some cases, e.g. the properties of Ops. This converts all objects as such, i.e. into immutable frozen types.

Parameters:obj (T|dict|list|tuple) –
Return type:T|FrozenDict|tuple
Util.make_dll_name(basename)[source]
Parameters:basename (str) –
Returns:e.g. “lib%s.so” % basename, depending on sys.platform
Return type:str
Util.escape_c_str(s)[source]
Parameters:s (str) –
Returns:C-escaped str
Return type:str
Util.attr_chain(base, attribs)[source]
Parameters:
  • base (object) –
  • attribs (list[str]|tuple[str]|str) –
Returns:

getattr(getattr(object, attribs[0]), attribs[1]) …

Return type:

object

Util.to_bool(v)[source]
Parameters:v (int|float|str) – if it is a string, it should represent some integer, or alternatively “true” or “false”
Return type:bool
Util.as_str(s)[source]
Parameters:s (str|unicode|bytes) –
Return type:str
Util.py2_utf8_str_to_unicode(s)[source]
Parameters:s (str) – e.g. the string literal “äöü” in Python 3 is correct, but in Python 2 it should have been u”äöü”, but just using “äöü” will actually be the raw utf8 byte sequence. This can happen when you eval() some string. We assume that you are using Python 2, and got the string (not unicode object) “äöü”, or maybe “abc”.
Returns:if it is indeed unicode, it will return the unicode object, otherwise it keeps the string
Return type:str|unicode
Util.deepcopy(x)[source]

Simpler variant of copy.deepcopy(). Should handle some edge cases as well, like copying module references.

Parameters:x (T) – an arbitrary object
Return type:T
Util.load_txt_vector(filename)[source]

Expect line-based text encoding in file. We also support Sprint XML format, which has some additional xml header and footer, which we will just strip away.

Parameters:filename (str) –
Return type:list[float]
class Util.CollectionReadCheckCovered(collection, truth_value=None)[source]

Wraps around a dict. It keeps track about all the keys which were read from the dict. Via assert_all_read(), you can check that there are no keys in the dict which were not read. The usage is for config dict options, where the user has specified a range of options, and where in the code there is usually a default for every non-specified option, to check whether all the user-specified options are also used (maybe the user made a typo).

Parameters:
  • collection (dict[str]) –
  • truth_value (None|bool) – note: check explicitly for self.truth_value, bool(self) is not the same!
classmethod from_bool_or_dict(value)[source]
Parameters:value (bool|dict[str]) –
Return type:CollectionReadCheckCovered
get(self, item, default=None)[source]
Parameters:
  • item (str) –
  • default (T) –
Return type:

T|object|None

assert_all_read(self)[source]

Asserts that all items have been read.

Util.which(program)[source]

Finds program in some of the dirs of the PATH env var.

Parameters:program (str) – e.g. “python”
Returns:full path, e.g. “/usr/bin/python”, or None
Return type:str|None
Util.overwrite_os_exec(prefix_args)[source]
Parameters:prefix_args (list[str]) –
Util.get_lsb_release()[source]
Returns:/etc/lsb-release parsed as a dict
Return type:dict[str,str]
Util.get_ubuntu_major_version()[source]
Return type:int|None
Util.auto_prefix_os_exec_prefix_ubuntu(prefix_args, ubuntu_min_version=16)[source]
Parameters:
  • prefix_args (list[str]) –
  • ubuntu_min_version (int) –
Example usage:
auto_prefix_os_exec_prefix_ubuntu([“/u/zeyer/tools/glibc217/ld-linux-x86-64.so.2”])
Util.cleanup_env_var_path(env_var, path_prefix)[source]
Parameters:
  • env_var (str) – e.g. “LD_LIBRARY_PATH”
  • path_prefix (str) –

Will remove all paths in os.environ[env_var] which are prefixed with path_prefix.

Util.get_login_username()[source]
Return type:str
Returns:the username of the current user.

Use this as a replacement for os.getlogin().

Util.get_temp_dir()[source]
Return type:str
Returns:e.g. “/tmp/$USERNAME”
class Util.LockFile(directory, name='lock_file', lock_timeout=3600)[source]

Simple lock file.

Parameters:
  • directory (str) –
  • lock_timeout (int|float) – in seconds
is_old_lockfile(self)[source]
Returns:Whether there is an existing lock file and the existing lock file is old.
Return type:bool
maybe_remove_old_lockfile(self)[source]

Removes an existing old lockfile, if there is one.

is_locked(self)[source]
Returns:whether there is an active (not old) lockfile
Return type:bool
lock(self)[source]

Acquires the lock.

unlock(self)[source]

Releases the lock.

Util.str_is_number(s)[source]
Parameters:s (str) – e.g. “1”, “.3” or “x”
Returns:whether s can be casted to float or int
Return type:bool
Util.sorted_values_from_dict(d)[source]
Parameters:d (dict[T,V]) –
Return type:list[V]
Util.dict_zip(keys, values)[source]
Parameters:
  • keys (list[T]) –
  • values (list[V]) –
Return type:

dict[T,V]

Util.parse_ld_conf_file(fn)[source]

Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py.

Parameters:fn (str) – e.g. “/etc/ld.so.conf”
Returns:list of paths for libs
Return type:list[str]
Util.get_ld_paths()[source]

To be very correct, see man-page of ld.so. And here: http://unix.stackexchange.com/questions/354295/what-is-the-default-value-of-ld-library-path/354296 Short version, not specific to an executable, in this order: - LD_LIBRARY_PATH - /etc/ld.so.cache (instead we will parse /etc/ld.so.conf) - /lib, /usr/lib (or maybe /lib64, /usr/lib64) Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py.

Return type:list[str]
Returns:list of paths to search for libs (*.so files)
Util.find_lib(lib_name)[source]
Parameters:lib_name (str) – without postfix/prefix, e.g. “cudart” or “blas”
Returns:returns full path to lib or None
Return type:str|None
Util.read_sge_num_procs(job_id=None)[source]

From the Sun Grid Engine (SGE), reads the num_proc setting for a particular job. If job_id is not provided and the JOB_ID env is set, it will use that instead (i.e. it uses the current job). This calls qstat to figure out this setting. There are multiple ways this can go wrong, so better catch any exception.

Parameters:job_id (int|None) –
Returns:num_proc
Return type:int|None
Util.get_number_available_cpus()[source]
Returns:number of available CPUs, if we can figure it out
Return type:int|None
Util.guess_requested_max_num_threads(log_file=None, fallback_num_cpus=True)[source]
Parameters:
  • log_file (io.File) –
  • fallback_num_cpus (bool) –
Return type:

int|None

Util.get_gpu_names()[source]
Return type:list[str]
Util.get_num_gpu_devices()[source]
Returns:(cpu count, gpu count)
Return type:(int, int)
Util.have_gpu()[source]
Return type:bool
Util.try_and_ignore_exception(f)[source]

Calls f, and ignores any exception.

Parameters:f (()->T) –
Returns:whatever f returns, or None
Return type:T|None
Util.try_get_caller_name(depth=1, fallback=None)[source]
Parameters:
  • depth (int) –
  • fallback (str|None) – this is returned if we fail for some reason
Return type:

str|None

Returns:

caller function name. this is just for debugging

Util.camel_case_to_snake_case(name)[source]
Parameters:name (str) – e.g. “CamelCase”
Returns:e.g. “camel_case”
Return type:str
Util.get_hostname()[source]
Returns:e.g. “cluster-cn-211”
Return type:str
Util.is_running_on_cluster()[source]
Returns:i6 specific. Whether we run on some of the cluster nodes.
Return type:bool
Util.get_utc_start_time_filename_part()[source]
Returns:string which can be used as part of a filename, which represents the start time of RETURNN in UTC
Return type:str
Util.log_runtime_info_to_dir(path, config)[source]

This will write multiple logging information into the path. It will create returnn.*.log with some meta information, as well as copy the used config file.

Parameters:
class Util.NativeCodeCompiler(base_name, code_version, code, is_cpp=True, c_macro_defines=None, ld_flags=None, include_paths=(), include_deps=None, static_version_name=None, should_cleanup_old_all=True, should_cleanup_old_mydir=False, use_cxx11_abi=False, verbose=False)[source]

Helper class to compile native C/C++ code on-the-fly.

Parameters:
  • base_name (str) – base name for the module, e.g. “zero_out”
  • code_version (int|tuple[int]) – check for the cache whether to reuse
  • code (str) – the source code itself
  • is_cpp (bool) – if False, C is assumed
  • c_macro_defines (dict[str,str|int]|None) – e.g. {“TENSORFLOW”: 1}
  • ld_flags (list[str]|None) – e.g. [“-lblas”]
  • include_paths (list[str]|tuple[str]) –
  • include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.
  • static_version_name (str|None) – normally, we use …/base_name/hash as the dir but this would use …/base_name/static_version_name.
  • should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)
  • should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.
  • verbose (bool) – be slightly more verbose
CacheDirName = 'returnn_native'[source]
CollectedCompilers = None[source]
load_lib_ctypes(self)[source]
Return type:ctypes.CDLL
get_lib_filename(self)[source]
Return type:str
Util.get_patch_atfork_lib()[source]
Returns:path to our patch_atfork lib. see maybe_restart_returnn_with_atfork_patch()
Return type:str
Util.maybe_restart_returnn_with_atfork_patch()[source]

What we want: subprocess.Popen to always work. Problem: It uses fork+exec internally in subprocess_fork_exec, via _posixsubprocess.fork_exec. That is a problem because fork can trigger any atfork handlers registered via pthread_atfork, and those can crash/deadlock in some cases.

https://github.com/tensorflow/tensorflow/issues/13802 https://github.com/xianyi/OpenBLAS/issues/240 https://trac.sagemath.org/ticket/22021 https://bugs.python.org/issue31814 https://stackoverflow.com/questions/46845496/ld-preload-and-linkage https://stackoverflow.com/questions/46810597/forkexec-without-atfork-handlers

The solution here: Just override pthread_atfork, via LD_PRELOAD. Note that in some cases, this is not enough (see the SO discussion), so we also overwrite fork itself. See also tests/test_fork_exec.py for a demo.

class Util.Stats(format_str=None)[source]

Collects mean and variance, running average.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

Parameters:format_str (None|((float|numpy.ndarray)->str)) –
collect(self, data)[source]
Parameters:data (numpy.ndarray) – shape (time, dim) or (time,)
get_mean(self)[source]
Returns:mean, shape (dim,)
Return type:numpy.ndarray
get_std_dev(self)[source]
Returns:std dev, shape (dim,)
Return type:numpy.ndarray
dump(self, output_file_prefix=None, stream=None, stream_prefix='')[source]
Parameters:
  • output_file_prefix (str|None) – if given, will numpy.savetxt mean|std_dev to disk
  • stream_prefix (str) –
  • stream (io.TextIOBase) – sys.stdout by default
Util.is_namedtuple(cls)[source]
Parameters:cls (T) – tuple, list or namedtuple type
Returns:whether cls is a namedtuple type
Return type:bool
Util.make_seq_of_type(cls, seq)[source]
Parameters:
  • cls (T) – e.g. tuple, list or namedtuple
  • seq (list|tuple|T) –
Returns:

cls(seq) or cls(*seq)

Return type:

T|list|tuple

Util.dummy_noop_ctx()[source]

Provides a no-op context manager.

Util.compute_bleu(reference_corpus, translation_corpus, max_order=4, use_bp=True)[source]

Computes BLEU score of translated segments against one or more references. Code adapted from Google Tensor2Tensor.

Args:
reference_corpus (list[list[int]|list[str]]): list of references for each translation. Each
reference should be tokenized into a list of tokens.
translation_corpus (list[list[int]|list[str]]): list of translations to score. Each translation
should be tokenized into a list of tokens.

max_order (int): Maximum n-gram order to use when computing BLEU score. use_bp (bool): boolean, whether to apply brevity penalty.

Returns:
BLEU score.
Util.monkeyfix_glib()[source]

Fixes some stupid bugs such that SIGINT is not working. This is used by audioread, and indirectly by librosa for loading audio. https://stackoverflow.com/questions/16410852/ See also monkeypatch_audioread().

Util.monkeypatch_audioread()[source]

audioread does not behave optimal in some cases. E.g. each call to _ca_available() takes quite long because of the ctypes.util.find_library usage. We will patch this.

However, the recommendation would be to not use audioread (librosa.load). audioread uses Gstreamer as a backend by default currently (on Linux). Gstreamer has multiple issues. See also monkeyfix_glib(), and here for discussion: https://github.com/beetbox/audioread/issues/62 https://github.com/beetbox/audioread/issues/63

Instead, use PySoundFile, which is also faster. See here for discussions: https://github.com/beetbox/audioread/issues/64 https://github.com/librosa/librosa/issues/681

Util.cf(filename)[source]

Cache manager. i6 specific.

Returns:filename
Return type:str
Util.binary_search_any(cmp, low, high)[source]

Binary search for a custom compare function.

Parameters:
  • cmp ((int)->int) – e.g. cmp(idx) == compare(array[idx], key)
  • low (int) – inclusive
  • high (int) – exclusive
Return type:

int|None

Util.generic_import_module(filename)[source]
Parameters:filename (str) – We try to be clever about filename. If it looks like a module name, just do importlib.import_module. If it looks like a filename, search for a base path (which does not have __init__.py), add that path to sys.path if needed, and import the remaining where “/” is replaced by “.” and the file extension is removed.
Returns:the module
Return type:types.ModuleType
Util.softmax(x, axis=None)[source]
Parameters:
  • x (numpy.ndarray) –
  • axis (int|None) –
Return type:

numpy.ndarray

Util.collect_proc_maps_exec_files()[source]

Currently only works on Linux…

Returns:list of mapped executables (libs)
Return type:list[str]
Util.find_sym_in_exec(fn, sym)[source]

Uses objdump to list available symbols, and filters them by the given sym.

Parameters:
  • fn (str) – path
  • sym (str) –
Returns:

matched out, or None

Return type:

str|None

Util.dummy_numpy_gemm_call()[source]

Just performs some GEMM call via Numpy. This makes sure that the BLAS library is loaded.

Util.find_sgemm_libs_from_runtime()[source]

Looks through all libs via collect_proc_maps_exec_files(), and searches for all which have the sgemm symbol. Currently only works on Linux (because collect_proc_maps_exec_files).

Returns:list of libs (their path)
Return type:list[str]
Util.find_libcudart_from_runtime()[source]

Looks through all libs via collect_proc_maps_exec_files(), and searches for all which have the sgemm symbol. Currently only works on Linux (because collect_proc_maps_exec_files).

Returns:list of libs (their path)
Return type:str|None