returnn.torch.util.diagnose_gpu
¶
Diagnostic functions for GPU information, failings, memory usage, etc.
- returnn.torch.util.diagnose_gpu.print_available_devices(*, file: TextIO | None = None)[source]¶
Print available devices, GPU (CUDA or other), etc.
- Parameters:
file – where to print to. stdout by default
- returnn.torch.util.diagnose_gpu.print_using_cuda_device_report(dev: str | device, *, file: TextIO | None = None)[source]¶
Theano and TensorFlow print sth like: Using gpu device 2: GeForce GTX 980 (…) Print in a similar format so that some scripts which grep our stdout work just as before.
- returnn.torch.util.diagnose_gpu.diagnose_no_gpu() List[str] [source]¶
Diagnose why we have no GPU. Print to stdout, but also prepare summary strings.
- Returns:
summary strings
- returnn.torch.util.diagnose_gpu.print_relevant_env_vars(*, file: TextIO | None = None)[source]¶
Print relevant environment variables which might affect the GPU usage, or PyTorch usage in general. For example: PYTORCH_CUDA_ALLOC_CONF, CUDA_LAUNCH_BLOCKING, etc. https://pytorch.org/docs/stable/torch_environment_variables.html
- Parameters:
file – where to print to. stdout by default
- returnn.torch.util.diagnose_gpu.garbage_collect()[source]¶
Perform garbage collection, including any special logic for GPU.
Also see: https://github.com/pytorch/pytorch/issues/18853 https://github.com/pytorch/pytorch/issues/27600 https://pytorch.org/docs/stable/notes/faq.html#my-out-of-memory-exception-handler-can-t-allocate-memory https://github.com/Lightning-AI/pytorch-lightning/blob/7a4b0fc4331633cdf00b88776689e8a84ef96cb4/src/lightning/pytorch/utilities/memory.py#L83