returnn.torch.util.diagnose_gpu

Diagnostic functions for GPU information, failings, memory usage, etc.

returnn.torch.util.diagnose_gpu.print_available_devices(*, file: TextIO | None = None)[source]

Print available devices, GPU (CUDA or other), etc.

Parameters:

file – where to print to. stdout by default

returnn.torch.util.diagnose_gpu.print_using_cuda_device_report(dev: str | device, *, file: TextIO | None = None)[source]

Theano and TensorFlow print sth like: Using gpu device 2: GeForce GTX 980 (…) Print in a similar format so that some scripts which grep our stdout work just as before.

returnn.torch.util.diagnose_gpu.diagnose_no_gpu() List[str][source]

Diagnose why we have no GPU. Print to stdout, but also prepare summary strings.

Returns:

summary strings

returnn.torch.util.diagnose_gpu.print_relevant_env_vars(*, file: TextIO | None = None)[source]

Print relevant environment variables which might affect the GPU usage, or PyTorch usage in general. For example: PYTORCH_CUDA_ALLOC_CONF, CUDA_LAUNCH_BLOCKING, etc. https://pytorch.org/docs/stable/torch_environment_variables.html

Parameters:

file – where to print to. stdout by default

returnn.torch.util.diagnose_gpu.garbage_collect()[source]

Perform garbage collection, including any special logic for GPU.

Also see: https://github.com/pytorch/pytorch/issues/18853 https://github.com/pytorch/pytorch/issues/27600 https://pytorch.org/docs/stable/notes/faq.html#my-out-of-memory-exception-handler-can-t-allocate-memory https://github.com/Lightning-AI/pytorch-lightning/blob/7a4b0fc4331633cdf00b88776689e8a84ef96cb4/src/lightning/pytorch/utilities/memory.py#L83