returnn.util.debug¶
Some generic debugging utilities.
- returnn.util.debug.auto_exclude_all_new_threads(func)[source]¶
- Parameters:
func (T)
- Returns:
func wrapped
- Return type:
T
- returnn.util.debug.dump_all_thread_tracebacks(*, exclude_thread_ids: Collection[int] | None = None, exclude_self: bool = False, file: TextIO | None = None)[source]¶
- Parameters:
exclude_thread_ids
exclude_self
file
- returnn.util.debug.setup_warn_with_traceback()[source]¶
Installs some hook for
warnings.showwarning.
- returnn.util.debug.init_better_exchook()[source]¶
Installs our own
sys.excepthook, which usesbetter_exchook, but adds some special handling for the main thread.
- returnn.util.debug.format_signum(signum)[source]¶
- Parameters:
signum (int)
- Returns:
string “signum (signame)”
- Return type:
str
- returnn.util.debug.signal_handler(signum, frame)[source]¶
Prints a message on stdout and dump all thread stacks.
- Parameters:
signum (int) – e.g. signal.SIGUSR1
frame – ignored, will dump all threads
- returnn.util.debug.install_signal_handler_if_default(signum, exceptions_are_fatal=False)[source]¶
- Parameters:
signum (int) – e.g. signal.SIGUSR1
exceptions_are_fatal (bool) – if True, will reraise any exceptions. if False, will just print a message
- Returns:
True iff no exception, False otherwise. not necessarily that we registered our own handler
- Return type:
bool
- returnn.util.debug.install_native_signal_handler(*, reraise_exceptions: bool = False)[source]¶
Installs some own custom C signal handler.
- returnn.util.debug.init_faulthandler(sigusr1_chain=False)[source]¶
Maybe installs signal handlers, SIGUSR1 and SIGUSR2 and others. If no signals handlers are installed yet for SIGUSR1/2, we try to install our own Python handler. This also tries to install the handler from the fauldhandler module, esp for SIGSEGV and others.
- Parameters:
sigusr1_chain (bool) – whether the default SIGUSR1 handler should also be called.
- returnn.util.debug.install_subproc_faulthandler()[source]¶
Install faulthandler in a spawned subprocess, dumping to a per-pid file.
Spawned children start with a fresh interpreter and do not inherit the main proc’s faulthandler setup (forked children would), so each spawn target must call this itself to be diagnosable. Both handlers point at <cwd>/faulthandler_dump.<pid>.log – a per-pid file, not stderr, so dumps from many procs don’t interleave into unreadable output:
faulthandler.enable(): dump on a fatal fault (SIGSEGV / SIGABRT / SIGFPE / SIGBUS).
faulthandler.register(SIGUSR1): on-demand all-thread dump, e.g. to inspect a hang. chain=False so we don’t fall through to SIG_DFL (= terminate) when no prior handler is set up, which would kill the very process we want to inspect.
No dump_traceback_later: an earlier iteration’s timer thread broke SyncManager bootstrap.
- returnn.util.debug.init_cuda_not_in_main_proc_check()[source]¶
Installs some hook to Theano which checks that CUDA is only used in the main proc.
- returnn.util.debug.debug_shell(user_ns: Dict[str, Any] | None = None, user_global_ns: Dict[str, Any] | None = None, exit_afterwards: bool = True)[source]¶
Provides some interactive Python shell. Uses IPython if possible. Wraps to
better_exchook.debug_shell.- Parameters:
user_ns
user_global_ns
exit_afterwards – will do sys.exit(1) at the end
- class returnn.util.debug.PyTracer(funcs_to_trace_list: Sequence[LambdaType | Callable], capture_type: type | Tuple[type, ...])[source]¶
Trace Python function execution to get intermediate outputs from the local variables.
E.g. for PyTorch code, when comparing results, it can be useful to see the intermediate tensors.
Example:
with PyTracer([my_func], torch.Tensor) as trace_my_impl: ... with PyTracer([reference_func], torch.Tensor) as trace_ref_impl: ...
Or another example:
from returnn.tensor import Tensor with PyTracer([my_func], Tensor) as trace_my_impl: ... with PyTracer([reference_func], torch.Tensor) as trace_ref_impl: ... check_py_traces_rf_to_pt_equal(trace_my_impl.captured_locals, trace_ref_impl.captured_locals, [...])
See also
check_py_traces_rf_to_pt_equal()to compare the traces.This class uses the Python
sys.settrace()mechanism to trace the locals. It accessesframe.f_localsto get the local variables. Note that this behavior is slightly buggy in versions of CPython <3.13, see for example: https://github.com/python/cpython/issues/113939 https://github.com/python/cpython/issues/74929 And thus the behavior might be different depending on the Python version. In Python >=3.13, you likely get a few more locals than before.- Parameters:
funcs_to_trace_list – list of functions to trace the locals. only those functions will be traced.
capture_type – only capture variables of this type, e.g. torch.Tensor.
- returnn.util.debug.check_py_traces_rf_to_pt_equal(trace_rf: Dict[Callable, List[Dict[str, List[Tensor]]]], trace_pt: Dict[Callable, List[Dict[str, List[torch.Tensor]]]], checks: List[Tuple[Tuple[Callable, int, str, int], Tuple[Callable, int, str, int], Tuple[Dim | str, ...] | Callable[[torch.Tensor], Tensor]]])[source]¶
Compares traces from some RETURNN-frontend (RF) based implementation with some pure PyTorch (PT) based implementation.
- Parameters:
trace_rf – RETURNN-frontend trace, from
PyTracertrace_pt – pure PyTorch trace, from
PyTracerchecks –
list of checks to perform. each check is a tuple of: - RF trace entry, e.g. (func, i, name, j) - PT trace entry, e.g. (func, i, name, j) - PT dims, e.g. (batch_dim, other_dim, …).
Instead of Dim, you can also use a string, which will be resolved from the RF trace (then you also need
Dimincapture_typeof thePyTracer). If callable, it gets the PyTorch tensor and should return the RETURNN tensor. Sometimes you might want to perform some reshaping, slicing, or similar, and then use rf.convert_to_tensor.