returnn.torch.data.returnn_dataset_wrapper¶
Wrapper for RETURNN datasets.
We make use of torch.utils.data.IterDataPipe.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetResetDefaultEpochCounterCallback(dataset: Dataset, *, epoch0: int = 0)[source]¶
Default for reset_callback. Has an internal counter for the epoch, starting by default at epoch 1 (RETURNN convention).
- Parameters:
dataset – RETURNN dataset.
epoch0 – Epoch from which the dataset sequence ordering should start. It will actually be epoch0+1 for the first epoch, since
__call__()will increment it. By default 0 since next__call__()will increment, thus we start at epoch 1.
Can be used as reset_callback.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetResetNoOpCallback[source]¶
Can be used as reset_callback.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetIterDataPipe(returnn_dataset: Dataset, *, reset_callback: Callable[[], None] | None = None)[source]¶
Converts a RETURNN dataset into a PyTorch IterableDataset.
- Parameters:
returnn_dataset – dataset to be wrapped
reset_callback – callback function to be called when the dataset is reset, e.g. to init the epoch. ReturnnDatasetResetDefaultEpochCounterCallback(returnn_dataset) is the default.
- reset()[source]¶
This is called by PyTorch DataLoader mechanism once we create a new iterator over the DataLoader. This happens at the beginning of each epoch.
(Note: The mechanism where
reset()is actually called is very obfuscated in PyTorch. As I understand it, there is a IterDataPipe metaclass (_IterDataPipeMeta) which automatically registers a hook on__iter__viahook_iterator. Deep inside the complex logic of this hook, it calls_set_datapipe_valid_iterator_idwhich then callsreset().)
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetPerEpochMapDataPipe[source]¶
Converts a RETURNN dataset into a PyTorch map-style Dataset.
- class returnn.torch.data.returnn_dataset_wrapper.ReturnnDatasetFullMapDataPipe[source]¶
Converts a RETURNN dataset into a PyTorch map-style Dataset. This is over the full dataset, using the default ordering. RETURNN-dataset-side sorting/shuffling is not supported here. Sorting/shuffling is intended to be done in the further PyTorch data pipeline.