Util

class Util.AsyncThreadRun(name, func)[source]
Parameters:
  • name (str) –
  • func (()->T) –
get()[source]
main()[source]
class Util.BackendEngine[source]
Default = 0[source]
TensorFlow = 1[source]
Theano = 0[source]
classmethod get_selected_engine()[source]
classmethod is_tensorflow_selected()[source]
classmethod is_theano_selected()[source]
classmethod select_engine(engine=None, config=None)[source]
Parameters:
selectedEngine = None[source]
class Util.CollectionReadCheckCovered(collection, truth_value=None)[source]

Wraps around a dict. It keeps track about all the keys which were read from the dict. Via assert_all_read(), you can check that there are no keys in the dict which were not read. The usage is for config dict options, where the user has specified a range of options, and where in the code there is usually a default for every non-specified option, to check whether all the user-specified options are also used (maybe the user made a typo).

Parameters:
  • collection (dict[str]) –
  • truth_value (None|bool) –
assert_all_read()[source]
classmethod from_bool_or_dict(value)[source]
Parameters:value (bool|dict[str]) –
Return type:CollectionReadCheckCovered
get(item, default=None)[source]
class Util.DictAsObj(dikt)[source]
class Util.FrozenDict[source]
class Util.LockFile(directory, name='lock_file', lock_timeout=3600)[source]
Parameters:
  • directory (str) –
  • lock_timeout (int|float) – in seconds
is_locked()[source]
is_old_lockfile()[source]
lock()[source]
maybe_remove_old_lockfile()[source]
unlock()[source]
class Util.NativeCodeCompiler(base_name, code_version, code, is_cpp=True, c_macro_defines=None, ld_flags=None, include_paths=(), include_deps=None, static_version_name=None, should_cleanup_old_all=True, should_cleanup_old_mydir=False, verbose=False)[source]

Helper class to compile native C/C++ code on-the-fly.

Parameters:
  • base_name (str) – base name for the module, e.g. “zero_out”
  • code_version (int|tuple[int]) – check for the cache whether to reuse
  • code (str) – the source code itself
  • is_cpp (bool) – if False, C is assumed
  • c_macro_defines (dict[str,str|int]|None) – e.g. {“TENSORFLOW”: 1}
  • ld_flags (list[str]|None) – e.g. [“-lblas”]
  • include_paths (list[str]|tuple[str]) –
  • include_deps (list[str]|None) – if provided and an existing lib file, we will check if any dependency is newer and we need to recompile. we could also do it automatically via -MD but that seems overkill and too slow.
  • static_version_name (str|None) – normally, we use .../base_name/hash as the dir but this would use .../base_name/static_version_name.
  • should_cleanup_old_all (bool) – whether we should look in the cache dir and check all ops if we can delete some old ones which are older than some limit (self._cleanup_time_limit_days)
  • should_cleanup_old_mydir (bool) – whether we should delete our op dir before we compile there.
  • verbose (bool) – be slightly more verbose
CacheDirName = 'returnn_native'[source]
get_lib_filename()[source]
load_lib_ctypes()[source]
class Util.NotSpecified[source]

This is just a placeholder, to be used as default argument to mark that it is not specified.

class Util.NumbersDict(auto_convert=None, numbers_dict=None, broadcast_value=None)[source]

It’s mostly like dict[str,float|int] & some optional broadcast default value. It implements the standard math bin ops in a straight-forward way.

classmethod bin_op(self, other, op, zero, result=None)[source]
classmethod bin_op_scalar_optional(self, other, zero, op)[source]
constant_like(number)[source]
copy()[source]
elem_eq(other, result_with_default=True)[source]

Element-wise equality check with other. Note about broadcast default value: Consider some key which is neither in self nor in other.

This means that self[key] == self.default, other[key] == other.default. Thus, in case that self.default != other.default, we get res.default == False. Then, all(res.values()) == False, even when all other values are True. This is sometimes not what we want. You can control the behavior via result_with_default.
get(key, default=None)[source]
has_values()[source]
keys()[source]
keys_set[source]
classmethod max(items)[source]

Element-wise maximum for item in items. :param list[NumbersDict|int|float] items: :rtype: NumbersDict

max_value()[source]

Maximum of our values.

classmethod min(items)[source]

Element-wise minimum for item in items. :param list[NumbersDict|int|float] items: :rtype: NumbersDict

pop(key, *args)[source]
unary_op(op)[source]
values()[source]
class Util.ObjAsDict(obj)[source]
items()[source]
class Util.Stats[source]

Collects mean and variance.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

collect(data)[source]
Parameters:data (numpy.ndarray) – shape (time, dim)
dump(output_file_prefix=None, stream=None)[source]
Parameters:
  • output_file_prefix (str|None) – if given, will numpy.savetxt mean|std_dev to disk
  • stream (io.TextIOBase) – sys.stdout by default
get_mean()[source]
Returns:mean, shape (dim,)
Return type:numpy.ndarray
get_std_dev()[source]
Returns:std dev, shape (dim,)
Return type:numpy.ndarray
Util.as_str(s)[source]
Util.attr_chain(base, attribs)[source]
Util.auto_prefix_os_exec_prefix_ubuntu(prefix_args, ubuntu_min_version=16)[source]
Parameters:
  • prefix_args (list[str]) –
  • ubuntu_min_version (int) –
Example usage:
auto_prefix_os_exec_prefix_ubuntu([“/u/zeyer/tools/glibc217/ld-linux-x86-64.so.2”])
Util.availablePhysicalMemoryInBytes()[source]
Util.betterRepr(o)[source]

The main difference: this one is deterministic. The orig dict.__repr__ has the order undefined for dict or set. For big dicts/sets/lists, add ”,” at the end to make textual diffs nicer.

Util.camel_case_to_snake_case(name)[source]
Parameters:name (str) – e.g. “CamelCase”
Returns:e.g. “camel_case”
Return type:str
Util.class_idx_seq_to_1_of_k(seq, num_classes)[source]
Util.cleanup_env_var_path(env_var, path_prefix)[source]
Parameters:
  • env_var (str) – e.g. “LD_LIBRARY_PATH”
  • path_prefix (str) –

Will remove all paths in os.environ[env_var] which are prefixed with path_prefix.

Util.cmd(s)[source]
Return type:list[str]

:returns all stdout splitted by newline. Does not cover stderr. Raises CalledProcessError on error.

Util.collect_class_init_kwargs(cls, only_with_default=False)[source]
Parameters:
  • cls (type) – class, where it assumes that kwargs are passed on to base classes
  • only_with_default (bool) – if given will only return the kwargs with default values
Returns:

set if not with_default, otherwise the dict to the default values

Return type:

list[str] | dict[str]

Util.collect_mandatory_class_init_kwargs(cls)[source]
Parameters:cls (type) –
Returns:list of kwargs which have no default, i.e. which must be provided
Return type:list[str]
Util.custom_exec(source, source_filename, user_ns, user_global_ns)[source]
Util.defaultCacheSizeInGBytes(factor=0.7)[source]
Util.describe_crnn_version()[source]
Return type:str
Returns:string like “20171017.163840–git-ab2a1da”, via git_describeHeadVersion()
Util.describe_tensorflow_version()[source]
Util.describe_theano_version()[source]
Util.dict_diff_str(self, other)[source]
Util.dict_joined(*ds)[source]
Util.dict_zip(keys, values)[source]
Util.escape_c_str(s)[source]
Util.eval_shell_env(token)[source]
Util.eval_shell_str(s)[source]
Return type:list[str]

Parses s as shell like arguments (via shlex.split) and evaluates shell environment variables (eval_shell_env). s or its elements can also be callable. In those cases, they will be called and the returned value is used.

Util.find_lib(lib_name)[source]
Parameters:lib_name (str) – without postfix/prefix, e.g. “cudart” or “blas”
Returns:returns full path to lib or None
Return type:str|None
Util.find_ranges(l)[source]

:returns list of ranges (start,end) where end is exclusive such that the union of range(start,end) matches l. :rtype: list[(int,int)] We expect that the incoming list is sorted and strongly monotonic increasing.

Util.get_ld_paths()[source]

To be very correct, see man-page of ld.so. And here: http://unix.stackexchange.com/questions/354295/what-is-the-default-value-of-ld-library-path/354296 Short version, not specific to an executable, in this order: - LD_LIBRARY_PATH - /etc/ld.so.cache (instead we will parse /etc/ld.so.conf) - /lib, /usr/lib (or maybe /lib64, /usr/lib64) Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py.

Return type:list[str]
Returns:list of paths to search for libs (*.so files)
Util.get_login_username()[source]
Return type:str
Returns:the username of the current user.

Use this as a replacement for os.getlogin().

Util.get_lsb_release()[source]
Util.get_patch_atfork_lib()[source]
Util.get_temp_dir()[source]
Return type:str
Returns:e.g. “/tmp/$USERNAME”
Util.get_tensorflow_version_tuple()[source]
Returns:tuple of ints, first entry is the major version
Return type:tuple[int]
Util.get_ubuntu_major_version()[source]
Return type:int|None
Util.git_commitDate(commit='HEAD', gitdir='.')[source]
Util.git_commitRev(commit='HEAD', gitdir='.')[source]
Util.git_describeHeadVersion(gitdir='.')[source]
Util.git_isDirty(gitdir='.')[source]
Util.hdf5_dimension(filename, dimension)[source]
Util.hdf5_group(filename, dimension)[source]
Util.hdf5_shape(filename, dimension)[source]
Util.hdf5_strings(handle, name, data)[source]
Util.help_on_type_error_wrong_args(cls, kwargs)[source]
Parameters:
  • cls (type) –
  • kwargs (list[str]) –
Util.hms(s)[source]
Parameters:s (float|int) – seconds
Returns:e.g. “1:23:45” (hs:ms:secs). see hms_fraction if you want to get fractional seconds
Return type:str
Util.hms_fraction(s, decimals=4)[source]
Parameters:
  • s (float) – seconds
  • decimals (int) – how much decimals to print
Returns:

e.g. “1:23:45.6789” (hs:ms:secs)

Return type:

str

Util.human_bytes_size(n, factor=1024, frac=0.8, prec=1)[source]
Util.human_size(n, factor=1000, frac=0.8, prec=1)[source]
Util.initThreadJoinHack()[source]
Util.inplace_increment(x, idx, y)[source]

This basically does x[idx] += y. The difference to the Numpy version is that in case some index is there multiple times, it will only be incremented once (and it is not specified which one). See also theano.tensor.subtensor.AdvancedIncSubtensor documentation.

Util.interrupt_main()[source]
Util.is_64bit_platform()[source]
Returns:True if we run on 64bit, False for 32bit
Return type:bool

http://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-is-executing-in-32bit-or-64bit-mode-on-os

Util.is_quitting()[source]
Util.json_remove_comments(string, strip_space=True)[source]
Return type:str

via https://github.com/getify/JSON.minify/blob/master/minify_json.py, by Gerald Storer, Pradyun S. Gedam, modified by us.

Util.load_json(filename=None, content=None)[source]
Util.load_txt_vector(filename)[source]

Expect line-based text encoding in file. We also support Sprint XML format, which has some additional xml header and footer, which we will just strip away.

Parameters:filename (str) –
Return type:list[float]
Util.log_runtime_info_to_dir(path, config)[source]

This will write multiple logging information into the path. It will create returnn.*.log with some meta information, as well as copy the used config file.

Parameters:
Util.make_dll_name(basename)[source]
Util.make_hashable(obj)[source]

Theano needs hashable objects in some cases, e.g. the properties of Ops. This converts all objects as such, i.e. into immutable frozen types.

Util.maybe_restart_returnn_with_atfork_patch()[source]

What we want: subprocess.Popen to always work. Problem: It uses fork+exec internally in subprocess_fork_exec, via _posixsubprocess.fork_exec. That is a problem because fork can trigger any atfork handlers registered via pthread_atfork, and those can crash/deadlock in some cases.

https://github.com/tensorflow/tensorflow/issues/13802 https://github.com/xianyi/OpenBLAS/issues/240 https://trac.sagemath.org/ticket/22021 https://bugs.python.org/issue31814 https://stackoverflow.com/questions/46845496/ld-preload-and-linkage https://stackoverflow.com/questions/46810597/forkexec-without-atfork-handlers

The solution here: Just override pthread_atfork, via LD_PRELOAD. Note that in some cases, this is not enough (see the SO discussion), so we also overwrite fork itself. See also tests/test_fork_exec.py for a demo.

Util.model_epoch_from_filename(filename)[source]
Util.obj_diff_str(self, other)[source]
Util.overwrite_os_exec(prefix_args)[source]
Parameters:prefix_args (list[str]) –
Util.parse_ld_conf_file(fn)[source]

Via https://github.com/albertz/system-tools/blob/master/bin/find-lib-in-path.py. :param str fn: e.g. “/etc/ld.so.conf” :return: list of paths for libs :rtype: list[str]

Util.parse_orthography(orthography, prefix=(), postfix=('[END]', ), remove_chars='(){}', collapse_spaces=True, final_strip=True, **kwargs)[source]

For Speech. Full processing. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello ”) + [“[HESITATION]”] + list(” there”) + [“[END]”] with word_based == True: returns [“hello”, “[HESITATION]”, “there”, “[END]”]

Does some preprocessing on orthography and then passes it on to parse_orthography_into_symbols().

Parameters:
  • orthography (str) – e.g. “hello [HESITATION] there “
  • prefix (list[str]) – will add this prefix
  • postfix (list[str]) – will add this postfix
  • remove_chars (str) – those chars will just be removed at the beginning
  • collapse_spaces (bool) – whether multiple spaces and tabs are collapsed into a single space
  • final_strip (bool) – whether we strip left and right
  • **kwargs

    passed on to parse_orthography_into_symbols()

Return type:

list[str]

Util.parse_orthography_into_symbols(orthography, upper_case_special=True, word_based=False)[source]

For Speech. Example:

orthography = “hello [HESITATION] there ” with word_based == False: returns list(“hello ”) + [“[HESITATION]”] + list(” there ”). with word_based == True: returns [“hello”, “[HESITATION]”, “there”]

No pre/post-processing such as: Spaces are kept as-is. No stripping at begin/end. (E.g. trailing spaces are not removed.) No tolower/toupper. Doesn’t add [BEGIN]/[END] symbols or so. Any such operations should be done explicitly in an additional function. Anything in []-brackets are meant as special-symbols. Also see parse_orthography() which includes some preprocessing.

Parameters:
  • orthography (str) – example: “hello [HESITATION] there “
  • upper_case_special (bool) – whether the special symbols are always made upper case
  • word_based (bool) – whether we split on space and return full words
Return type:

list[str]

Util.progress_bar(complete=1.0, prefix='', suffix='')[source]
Util.progress_bar_with_time(complete=1.0, prefix='', **kwargs)[source]
Util.random_orthogonal(shape, gain=1.0, seed=None)[source]

Returns a random orthogonal matrix of the given shape. Code borrowed and adapted from Keras: https://github.com/fchollet/keras/blob/master/keras/initializers.py Reference: Saxe et al., http://arxiv.org/abs/1312.6120 Related: Unitary Evolution Recurrent Neural Networks, https://arxiv.org/abs/1511.06464

Parameters:
  • shape (tuple[int]) –
  • gain (float) –
  • seed (int) – for Numpy random generator
Returns:

random orthogonal matrix

Return type:

numpy.ndarray

Util.read_sge_num_procs(job_id=None)[source]

From the Sun Grid Engine (SGE), reads the num_proc setting for a particular job. If job_id is not provided and the JOB_ID env is set, it will use that instead (i.e. it uses the current job). This calls qstat to figure out this setting. There are multiple ways this can go wrong, so better catch any exception.

Parameters:job_id (int|None) –
Returns:num_proc
Return type:int|None
Util.simpleObjRepr(obj)[source]

All self.__init__ args.

Util.slice_pad_zeros(x, begin, end, axis=0)[source]
Parameters:
  • x (numpy.ndarray) – of shape (..., time, ...)
  • begin (int) –
  • end (int) –
  • axis (int) –
Returns:

basically x[begin:end] (with axis==0) but if begin < 0 or end > x.shape[0], it will not discard these frames but pad zeros, such that the resulting shape[0] == end - begin.

Return type:

numpy.ndarray

Util.sorted_values_from_dict(d)[source]
Util.start_daemon_thread(target, args=())[source]
Util.str_is_number(s)[source]
Parameters:s (str) – e.g. “1”, ”.3” or “x”
Returns:whether s can be casted to float or int
Return type:bool
Util.sysexecOut(*args, **kwargs)[source]
Util.sysexecRetCode(*args, **kwargs)[source]
Util.terminal_size()[source]
Util.to_bool(v)[source]
Util.try_and_ignore_exception(f)[source]
Util.try_get_caller_name(depth=1, fallback=None)[source]
Parameters:
  • depth (int) –
  • fallback (str|None) – this is returned if we fail for some reason
Return type:

str|None

Returns:

caller function name. this is just for debugging

Util.try_run(func, args=(), catch_exc=<type 'exceptions.Exception'>, default=None)[source]
Util.unicode_to_str_recursive(s)[source]
Util.uniq(seq)[source]

Like Unix tool uniq. Removes repeated entries. :param seq: numpy.array :return: seq

Util.which(program)[source]
Util.wrap_async_func(f)[source]