returnn.datasets.sprint#

Implements the SprintDatasetBase and ExternSprintDataset classes, some Dataset subtypes. Note that from the main RETURNN process, you probably want ExternSprintDataset instead.

class returnn.datasets.sprint.SprintDatasetBase(target_maps=None, str_add_final_zero=False, input_stddev=1.0, orth_post_process=None, bpe=None, orth_vocab=None, suppress_load_seqs_print=False, reduce_target_factor=1, **kwargs)[source]#

In Sprint, we use this object for multiple purposes: - Multiple epoch handling via SprintInterface.getSegmentList().

For this, we get the segment list from Sprint and use the Dataset shuffling method.

  • Fill in data which we get via SprintInterface.feedInput*(). Note that each such input doesn’t necessarily correspond to a single segment. This depends which type of FeatureExtractor is used in Sprint. If we use the BufferedFeatureExtractor in utterance mode, we will get one call for every segment and we get also segmentName as parameter. Otherwise, we will get batches of fixed size - in that case, it doesn’t correspond to the segments. In any case, we use this data as-is as a new seq. Because of that, we cannot really know the number of seqs in advance, nor the total number of time frames, etc.

If you want to use this directly in RETURNN, see ExternSprintDataset.

Parameters:
  • target_maps (dict[str,str|dict]) – e.g. {“speaker_name”: “speaker_map.txt”}, with “speaker_map.txt” containing a line for each expected speaker. The indices will be given by the line index. Note that scalar content (e.g. single index) will automatically get a time axis added with the length of the audio frames.

  • str_add_final_zero (bool) – adds e.g. “orth0” with ‘'-ending

  • input_stddev (float) – if != 1, will divide the input “data” by that

  • orth_post_process (str|list[str]|((str)->str)|None) – get_post_processor_function(), applied on orth

  • bpe (None|dict[str]) – if given, will be opts for BytePairEncoding

  • orth_vocab (None|dict[str]) – if given, orth_vocab is applied to orth and orth_classes is an available target`

  • suppress_load_seqs_print (bool) – less verbose

  • reduce_target_factor (int) – downsample factor to allow less targets than features

SprintCachedSeqsMax = 200[source]#
SprintCachedSeqsMin = 100[source]#
use_multiple_epochs()[source]#

Called via SprintInterface.getSegmentList().

set_dimensions(input_dim, output_dim)[source]#

Called via python_train.

init_sprint_epoch(epoch)[source]#

Called by SprintInterface.getSegmentList() when we start a new epoch. We must not call this via self.init_seq_order() because we will already have filled the cache by Sprint before the RETURNN train thread starts the epoch.

finalize_sprint()[source]#

Called when SprintInterface.getSegmentList() ends.

init_seq_order(epoch=None, seq_list=None, seq_order=None)[source]#

Called by RETURNN train thread when we enter a new epoch.

wait_for_returnn_epoch(epoch)[source]#

Called by SprintInterface.

is_cached(start, end)[source]#
Parameters:
  • start (int) –

  • end (int) –

Return type:

bool

load_seqs(start, end)[source]#

Called by RETURNN train thread.

Parameters:
  • start (int) –

  • end (int) –

add_new_data(features, targets=None, segment_name=None)[source]#

Adds a new seq. This is called via the Sprint main thread.

Parameters:
  • features (numpy.ndarray) – format (input-feature,time) (via Sprint)

  • targets (dict[str,numpy.ndarray|str]) – format (time) (idx of output-feature)

  • segment_name (str|None) –

:returns the sorted seq index :rtype: int

finish_sprint_epoch(seen_all=True)[source]#

Called by SprintInterface.getSegmentList(). This is in a state where Sprint asks for the next segment after we just finished an epoch. Thus, any upcoming self.add_new_data() call will contain data from a segment in the new epoch. Thus, we finish the current epoch in Sprint.

get_num_timesteps()[source]#
Return type:

int

property num_seqs[source]#
Return type:

int

have_seqs()[source]#
Return type:

bool

is_less_than_num_seqs(n)[source]#
Parameters:

n (int) –

Return type:

bool

get_data_keys()[source]#
Return type:

list[str]

get_target_list()[source]#
Return type:

list[str]

set_complete_frac(frac)[source]#
Parameters:

frac (float) –

get_complete_frac(seq_idx)[source]#
Parameters:

seq_idx (int) –

Return type:

float

get_seq_length(sorted_seq_idx)[source]#
Parameters:

sorted_seq_idx (int) –

Return type:

Util.NumbersDict

get_data(seq_idx, key)[source]#
Parameters:
  • seq_idx (int) –

  • key (str) –

Return type:

numpy.ndarray

get_input_data(sorted_seq_idx)[source]#
Parameters:

sorted_seq_idx (int) –

Return type:

numpy.ndarray

get_targets(target, sorted_seq_idx)[source]#
Parameters:
  • target (str) –

  • sorted_seq_idx (int) –

Return type:

numpy.ndarray

get_tag(sorted_seq_idx)[source]#
Parameters:

sorted_seq_idx (int) –

Return type:

str

class returnn.datasets.sprint.ExternSprintDataset(sprintTrainerExecPath, sprintConfigStr, partitionEpoch=None, **kwargs)[source]#

This is a Dataset which you can use directly in RETURNN. You can use it to get any type of data from Sprint (RWTH ASR toolkit), e.g. you can use Sprint to do feature extraction and preprocessing.

This class is like SprintDatasetBase, except that we will start an external Sprint instance ourselves which will forward the data to us over a pipe. The Sprint subprocess will use SprintExternInterface to communicate with us.

Parameters:
  • sprintTrainerExecPath (str|list[str]) –

  • sprintConfigStr (str | list[str] | ()->str | list[()->str] | ()->list[str] | ()->list[()->str]) – via eval_shell_str

  • partitionEpoch (int|None) – deprecated. use partition_epoch instead

finish_epoch(*, free_resources: bool = False)[source]#

Called at the end of the epoch.

init_seq_order(epoch=None, seq_list=None, seq_order=None)[source]#
Parameters:
  • epoch (int) –

  • seq_list (list[str]|None) –

  • seq_order (list[int]|None) –

Return type:

bool

orth_post_process: Optional[Callable[[str], str]][source]#
lock: RLock | None[source]#
rnd_seq_drop: Optional[Random][source]#
num_outputs: Optional[Dict[str, Tuple[int, int]]][source]#
labels: Dict[str, List[str]][source]#
class returnn.datasets.sprint.SprintCacheDataset(data, **kwargs)[source]#

Can directly read Sprint cache files (and bundle files). Supports both cached features and cached alignments. For alignments, you need to provide all options for the AllophoneLabeling class, such as allophone file, etc.

Parameters:

data (dict[str,dict[str]]) – data-key -> dict which keys such as filename, see SprintCacheReader constructor

class SprintCacheReader(data_key, filename, data_type=None, allophone_labeling=None)[source]#

Helper class to read a Sprint cache directly.

Parameters:
  • data_key (str) – e.g. “data” or “classes”

  • filename (str) – to Sprint cache archive

  • data_type (str|None) – “feat” or “align”

  • allophone_labeling (dict[str]) – kwargs for AllophoneLabeling

read(name)[source]#
Parameters:

name (str) – content-filename for sprint cache

Returns:

numpy array of shape (time, [num_labels])

Return type:

numpy.ndarray

init_seq_order(epoch=None, seq_list=None, seq_order=None)[source]#
Parameters:
  • epoch (int) –

  • seq_list (list[str]|None) –

  • seq_order (list[int]|None) –

Return type:

bool

get_total_num_seqs() int[source]#

total num seqs

get_all_tags() List[str][source]#

all seq names

supports_seq_order_sorting() bool[source]#

supports sorting

get_dataset_seq_for_name(name, seq_idx=-1)[source]#
Parameters:
  • name (str) –

  • seq_idx (int) –

Return type:

DatasetSeq

get_data_keys()[source]#
Return type:

list[str]

get_target_list()[source]#
Return type:

list[str]

get_tag(sorted_seq_idx)[source]#
Return type:

str

returnn.datasets.sprint.demo()[source]#

Demo.