returnn.datasets.map
#
Provides MapDatasetBase
- class returnn.datasets.map.MapDatasetBase(data_types=None)[source]#
This dataset can be used as template to implement user-side Datasets, where the data can be access in arbitrary order. For global sorting, the length information needs to be known beforehand, see get_seq_len.
- Parameters:
data_types (dict[str,dict]) – data_key -> constructor parameters of Data object, for all data streams the dataset provides (inputs and targets). E.g. {‘data’: {‘dim’: 1000, ‘sparse’: True, …}, ‘classes’: …}.
- get_seq_len(seq_idx)[source]#
This optional function provides the sequence length for the seq_ordering parameter. If not specified only a limited set of options is available.
- Parameters:
seq_idx (int) –
- Returns:
sequence length
- Return type:
int
- get_seq_tag(seq_idx)[source]#
- Parameters:
seq_idx (int) –
- Returns:
tag for the sequence of the given index, default is ‘seq-{seq_idx}’.
- Return type:
str
- get_seq_order(epoch=None)[source]#
Override to implement a dataset specific sequence order for a given epoch. The number of sequences can be less than the total number. This will override the effects of partition_epoch and seq_ordering when using MapDatasetWrapper.
- Parameters:
epoch (int) –
- Returns:
sequence order (list of sequence indices)
- Return type:
list[int]
- class returnn.datasets.map.MapDatasetWrapper(map_dataset, **kwargs)[source]#
Takes a MapDataset and turns it into a returnn.datasets.Dataset by providing the required class methods.
- Parameters:
map_dataset (MapDatasetBase|function) – the MapDataset to be wrapped
- property map_dataset: MapDatasetBase[source]#
- Returns:
the wrapped MapDataset
- init_seq_order(epoch=None, seq_list=None, seq_order=None)[source]#
- Parameters:
epoch (int|None) –
seq_list (list[str]|None) – List of sequence tags, to set a predefined order.
seq_order (list[int]|None) – List of corpus sequence indices, to set a predefined order.
- Return type:
bool
:returns whether the order changed (True is always safe to return)
- get_corpus_seq_idx(sorted_seq_idx)[source]#
- Parameters:
sorted_seq_idx (int) –
:return corpus_seq_idx :rtype: int
- have_corpus_seq_idx()[source]#
- Return type:
bool
- Returns:
whether you can call self.get_corpus_seq_idx()
- get_data_dim(key)[source]#
- Parameters:
key (str) – e.g. “data” or “classes”
- Returns:
number of classes, no matter if sparse or not
- Return type:
int
- get_data_dtype(key)[source]#
- Parameters:
key (str) – e.g. “data” or “classes”
- Returns:
dtype as str, e.g. “int32” or “float32”
- Return type:
str
- class returnn.datasets.map.FromListDataset(data_list, sort_data_key=None, **kwargs)[source]#
Simple implementation of a MapDataset where all data is given in a list.
- Parameters:
data_list (list[dict[str,numpy.ndarray]]) – sequence data as a dict data_key -> data for all sequences.
sort_data_key (str) – Sequence length will be determined from data of this data_key.