returnn.datasets.map#

Provides MapDatasetBase

class returnn.datasets.map.MapDatasetBase(data_types=None)[source]#

This dataset can be used as template to implement user-side Datasets, where the data can be access in arbitrary order. For global sorting, the length information needs to be known beforehand, see get_seq_len.

Parameters:

data_types (dict[str,dict]) – data_key -> constructor parameters of Data object, for all data streams the dataset provides (inputs and targets). E.g. {‘data’: {‘dim’: 1000, ‘sparse’: True, …}, ‘classes’: …}.

get_seq_len(seq_idx)[source]#

This optional function provides the sequence length for the seq_ordering parameter. If not specified only a limited set of options is available.

Parameters:

seq_idx (int) –

Returns:

sequence length

Return type:

int

get_seq_tag(seq_idx)[source]#
Parameters:

seq_idx (int) –

Returns:

tag for the sequence of the given index, default is ‘seq-{seq_idx}’.

Return type:

str

get_seq_order(epoch=None)[source]#

Override to implement a dataset specific sequence order for a given epoch. The number of sequences can be less than the total number. This will override the effects of partition_epoch and seq_ordering when using MapDatasetWrapper.

Parameters:

epoch (int) –

Returns:

sequence order (list of sequence indices)

Return type:

list[int]

class returnn.datasets.map.MapDatasetWrapper(map_dataset, **kwargs)[source]#

Takes a MapDataset and turns it into a returnn.datasets.Dataset by providing the required class methods.

Parameters:

map_dataset (MapDatasetBase|function) – the MapDataset to be wrapped

property map_dataset: MapDatasetBase[source]#
Returns:

the wrapped MapDataset

property num_seqs[source]#

:returns number of sequences in the current epoch :rtype: int

get_total_num_seqs() int[source]#
Returns:

total number of seqs

init_seq_order(epoch=None, seq_list=None, seq_order=None)[source]#
Parameters:
  • epoch (int|None) –

  • seq_list (list[str]|None) – List of sequence tags, to set a predefined order.

  • seq_order (list[int]|None) – List of corpus sequence indices, to set a predefined order.

Return type:

bool

:returns whether the order changed (True is always safe to return)

supports_seq_order_sorting() bool[source]#

supports sorting

get_current_seq_order()[source]#
Return type:

Sequence[int]

get_tag(sorted_seq_idx)[source]#
Parameters:

sorted_seq_idx

Returns:

get_all_tags() List[str][source]#
Returns:

list of all tags

get_corpus_seq_idx(sorted_seq_idx)[source]#
Parameters:

sorted_seq_idx (int) –

:return corpus_seq_idx :rtype: int

have_corpus_seq_idx()[source]#
Return type:

bool

Returns:

whether you can call self.get_corpus_seq_idx()

get_data_keys() List[str][source]#
Returns:

keys

get_data_dim(key)[source]#
Parameters:

key (str) – e.g. “data” or “classes”

Returns:

number of classes, no matter if sparse or not

Return type:

int

get_data_dtype(key)[source]#
Parameters:

key (str) – e.g. “data” or “classes”

Returns:

dtype as str, e.g. “int32” or “float32”

Return type:

str

is_data_sparse(key)[source]#
Parameters:

key (str) – e.g. “data” or “classes”

Returns:

whether the data is sparse

Return type:

bool

get_data_shape(key)[source]#

:returns get_data(*, key).shape[1:], i.e. num-frames excluded :rtype: list[int]

class returnn.datasets.map.FromListDataset(data_list, sort_data_key=None, **kwargs)[source]#

Simple implementation of a MapDataset where all data is given in a list.

Parameters:
  • data_list (list[dict[str,numpy.ndarray]]) – sequence data as a dict data_key -> data for all sequences.

  • sort_data_key (str) – Sequence length will be determined from data of this data_key.

get_seq_len(seq_idx)[source]#
Parameters:

seq_idx

Returns:

length of data for ‘sort_data_key’

Return type:

int