returnn.torch.data.returnn_dataset_wrapper
¶
Wrapper for RETURNN datasets.
We make use of torch.utils.data.IterDataPipe.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetResetDefaultEpochCounterCallback(dataset: Dataset)[source]¶
Default for reset_callback. Has an internal counter for the epoch, starting at epoch 1 (RETURNN convention).
Can be used as reset_callback.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetResetNoOpCallback[source]¶
Can be used as reset_callback.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetIterDataPipe(returnn_dataset: Dataset, *, reset_callback: Callable[[], None] | None = None)[source]¶
Converts a RETURNN dataset into a PyTorch IterableDataset.
- Parameters:
returnn_dataset – dataset to be wrapped
reset_callback – callback function to be called when the dataset is reset, e.g. to init the epoch. ReturnnDatasetResetDefaultEpochCounterCallback(returnn_dataset) is the default.
- reset()[source]¶
This is called by PyTorch DataLoader mechanism once we create a new iterator over the DataLoader. This happens at the beginning of each epoch.
(Note: The mechanism where
reset()
is actually called is very obfuscated in PyTorch. As I understand it, there is a IterDataPipe metaclass (_IterDataPipeMeta) which automatically registers a hook on__iter__
viahook_iterator
. Deep inside the complex logic of this hook, it calls_set_datapipe_valid_iterator_id
which then callsreset()
.)
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetPerEpochMapDataPipe[source]¶
Converts a RETURNN dataset into a PyTorch map-style Dataset.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetFullMapDataPipe[source]¶
Converts a RETURNN dataset into a PyTorch map-style Dataset. This is over the full dataset, using the default ordering. RETURNN-dataset-side sorting/shuffling is not supported here. Sorting/shuffling is intended to be done in the further PyTorch data pipeline.