Audio Datasets

Extern Sprint Dataset

This is a Dataset which you can use directly in RETURNN. You can use it to get any type of data from Sprint (RWTH ASR toolkit), e.g. you can use Sprint to do feature extraction and preprocessing.

This class is like SprintDatasetBase, except that we will start an external Sprint instance ourselves which will forward the data to us over a pipe. The Sprint subprocess will use SprintExternInterface to communicate with us.

Ogg Zip Dataset

Generic dataset which reads a Zip file containing Ogg files for each sequence and a text document. The feature extraction settings are determined by the audio option, which is passed to ExtractAudioFeatures. Does also support Wav files, and might even support other file formats readable by the ‘soundfile’ library (not tested). By setting audio or targets to None, the dataset can be used in text only or audio only mode. The content of the zip file is:

  • a .txt file with the same name as the zipfile, containing a python list of dictionaries
  • a subfolder with the same name as the zipfile, containing the audio files

The dictionaries in the .txt file must have the following structure:

[{'seq_name': 'arbitrary_sequence_name', 'text': 'some utterance text', 'duration': 2.3, 'file': 'sequence0.wav'}, ...]

If seq_name is not included, the seq_tag will be the name of the file. duration is mandatory, as this information is needed for the sequence sorting.