GeneratingDataset

class GeneratingDataset.GeneratingDataset(input_dim, output_dim, num_seqs=inf, fixed_random_seed=None, **kwargs)[source]
Parameters:
  • input_dim (int) –
  • output_dim (int|dict[str,int|(int,int)|dict]) –
  • num_seqs (int|float) –
  • fixed_random_seed (int) –
init_seq_order(epoch=None, seq_list=None)[source]
Parameters:seq_list – predefined order. doesn’t make sense here

This is called when we start a new epoch, or at initialization.

is_cached(start, end)[source]
generate_seq(seq_idx)[source]
Return type:DatasetSeq
get_num_timesteps()[source]
num_seqs[source]
get_seq_length(sorted_seq_idx)[source]
get_input_data(sorted_seq_idx)[source]
get_targets(target, sorted_seq_idx)[source]
get_ctc_targets(sorted_seq_idx)[source]
get_tag(sorted_seq_idx)[source]
class GeneratingDataset.Task12AXDataset(**kwargs)[source]

12AX memory task. This is a simple memory task where there is an outer loop and an inner loop. Description here: http://psych.colorado.edu/~oreilly/pubs-abstr.html#OReillyFrank06

get_random_seq_len()[source]
generate_input_seq(seq_len)[source]

Somewhat made up probability distribution. Try to make in a way that at least some “R” will occur in the output seq. Otherwise, “R”s are really rare.

classmethod make_output_seq(input_seq)[source]
Return type:list[int]
estimate_output_class_priors(num_trials, seq_len=10)[source]
Return type:(float, float)
generate_seq(seq_idx)[source]
class GeneratingDataset.TaskEpisodicCopyDataset(**kwargs)[source]

Episodic Copy memory task. This is a simple memory task where we need to remember a sequence. Described in: http://arxiv.org/abs/1511.06464 Also tested for Associative LSTMs. This is a variant where the lengths are random, both for the chars and for blanks.

generate_input_seq()[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
generate_seq(seq_idx)[source]
class GeneratingDataset.TaskXmlModelingDataset(limit_stack_depth=4, **kwargs)[source]

XML modeling memory task. This is a memory task where we need to remember a stack. Defined in Jozefowicz et al. (2015). Also tested for Associative LSTMs.

generate_input_seq()[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
generate_seq(seq_idx)[source]
class GeneratingDataset.TaskVariableAssignmentDataset(**kwargs)[source]

Variable Assignment memory task. This is a memory task to test for key-value retrieval. Defined in Associative LSTM paper.

generate_input_seq()[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
generate_seq(seq_idx)[source]
class GeneratingDataset.DummyDataset(input_dim, output_dim, num_seqs, seq_len=2, input_max_value=10.0, input_shift=None, input_scale=None, **kwargs)[source]
generate_seq(seq_idx)[source]
class GeneratingDataset.StaticDataset(data, target_list=None, output_dim=None, input_dim=None, **kwargs)[source]
classmethod copy_from_dataset(dataset, start_seq_idx=0, max_seqs=None)[source]
Parameters:
  • dataset (Dataset) –
  • start_seq_idx (int) –
  • max_seqs (int|None) –
Return type:

StaticDataset

generate_seq(seq_idx)[source]
get_target_list()[source]
class GeneratingDataset.CopyTaskDataset(nsymbols, minlen=0, maxlen=0, minlen_epoch_factor=0, maxlen_epoch_factor=0, **kwargs)[source]
get_random_seq_len()[source]
generate_seq(seq_idx)[source]
Return type:DatasetSeq
class GeneratingDataset.ExtractAudioFeatures(window_len=0.025, step_len=0.01, num_feature_filters=40, with_delta=False, norm_mean=None, norm_std_dev=None, random_permute=None, random_state=None)[source]

Currently uses librosa to extract MFCC features. We could also use python_speech_features. We could also add support e.g. to directly extract log-filterbanks or so.

Parameters:
  • audio (numpy.ndarray) – raw audio samples, shape (audio_len,)
  • sample_rate (int) – e.g. 22050
  • window_len (float) – in seconds
  • step_len (float) – in seconds
  • num_feature_filters (int) –
  • with_delta (bool|int) –
  • norm_mean (numpy.ndarray|str|None) –
  • norm_std_dev (numpy.ndarray|str|None) –
  • random_permute (CollectionReadCheckCovered|dict[str]|bool|None) –
  • random_state (numpy.random.RandomState|None) –
Returns:

(audio_len // int(step_len * sample_rate), max(1, with_delta) * num_feature_filters), float32

Return type:

numpy.ndarray

get_audio_features(audio, sample_rate)[source]
Parameters:
  • audio (numpy.ndarray) – raw audio samples, shape (audio_len,)
  • sample_rate (int) – e.g. 22050
Return type:

numpy.ndarray

get_feature_dimension()[source]
class GeneratingDataset.TimitDataset(timit_dir, train=True, preload=False, num_feature_filters=40, feature_window_len=0.025, feature_step_len=0.01, with_delta=False, norm_mean=None, norm_std_dev=None, random_permute_audio=None, num_phones=61, demo_play_audio=False, fixed_random_seed=None, **kwargs)[source]

DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. You must provide the data.

Demo:

tools/dump-dataset.py “{‘class’: ‘TimitDataset’, ‘timit_dir’: ‘…’}” tools/dump-dataset.py “{‘class’: ‘TimitDataset’, ‘timit_dir’: ‘…’, ‘demo_play_audio’: True, ‘random_permute_audio’: True}”

The full train data has 3696 utterances and the core test data has 192 utterances (24-speaker core test set).

For some references: https://github.com/ppwwyyxx/tensorpack/blob/master/examples/CTC-TIMIT/train-timit.py https://www.cs.toronto.edu/~graves/preprint.pdf https://arxiv.org/pdf/1303.5778.pdf https://arxiv.org/pdf/0804.3269.pdf

Parameters:
  • timit_dir (str) – directory of TIMIT. should contain train/filelist.phn and test/filelist.core.phn
  • train (bool) – whether to use the train or core test data
  • preload (bool) – if True, here at __init__, we will wait until we loaded all the data
  • num_feature_filters (int) – e.g. number of MFCCs
  • with_delta (bool|int) – whether to add delta features (doubles the features dim). if int, up to this degree
  • norm_mean (str) – file with mean values which are used for mean-normalization of the final features
  • norm_std_dev (str) – file with std dev valeus for variance-normalization of the final features
  • random_permute_audio (None|bool|dict[str]) – enables permutation on the audio. see _get_random_permuted_audio
  • num_phones (int) – 39, 48 or 61. num labels of our classes
  • demo_play_audio (bool) – plays the audio. only make sense with tools/dump-dataset.py
  • fixed_random_seed (None|int) – if given, use this fixed random seed in every epoch
PhoneMapTo39 = {'aa': 'aa', 'kcl': 'sil', 'v': 'v', 'ow': 'ow', 'uh': 'uh', 'm': 'm', 'jh': 'jh', 'ah': 'ah', 'd': 'd', 'p': 'p', 'ay': 'ay', 'l': 'l', 'dx': 'dx', 'ax': 'ah', 'ux': 'uw', 'bcl': 'sil', 'th': 'th', 'tcl': 'sil', 'gcl': 'sil', 'ch': 'ch', 'eh': 'eh', 'y': 'y', 'en': 'n', 'oy': 'oy', 'hh': 'hh', 'pau': 'sil', 't': 't', 'uw': 'uw', 'ax-h': 'ah', 'pcl': 'sil', 'f': 'f', 'ix': 'ih', 'q': None, 'g': 'g', 'aw': 'aw', 'n': 'n', 'nx': 'n', 'ao': 'aa', 's': 's', 'el': 'l', 'hv': 'hh', 'axr': 'er', 'sh': 'sh', 'ae': 'ae', 'ih': 'ih', 'epi': 'sil', 'eng': 'ng', 'ng': 'ng', 'h#': 'sil', 'dcl': 'sil', 'em': 'm', 'ey': 'ey', 'iy': 'iy', 'zh': 'sh', 'b': 'b', 'k': 'k', 'r': 'r', 'w': 'w', 'dh': 'dh', 'z': 'z', 'er': 'er'}[source]
PhoneMapTo48 = {'aa': 'aa', 'kcl': 'cl', 'v': 'v', 'ow': 'ow', 'uh': 'uh', 'm': 'm', 'jh': 'jh', 'ah': 'ah', 'd': 'd', 'p': 'p', 'ay': 'ay', 'l': 'l', 'dx': 'dx', 'ax': 'ax', 'ux': 'uw', 'bcl': 'vcl', 'th': 'th', 'tcl': 'cl', 'gcl': 'vcl', 'ch': 'ch', 'eh': 'eh', 'y': 'y', 'en': 'en', 'oy': 'oy', 'hh': 'hh', 'pau': 'sil', 't': 't', 'uw': 'uw', 'ax-h': 'ax', 'pcl': 'cl', 'f': 'f', 'ix': 'ix', 'q': None, 'g': 'g', 'aw': 'aw', 'n': 'n', 'nx': 'n', 'ao': 'ao', 's': 's', 'el': 'el', 'hv': 'hh', 'axr': 'er', 'sh': 'sh', 'ae': 'ae', 'ih': 'ih', 'epi': 'epi', 'eng': 'ng', 'ng': 'ng', 'h#': 'sil', 'dcl': 'vcl', 'em': 'm', 'ey': 'ey', 'iy': 'iy', 'zh': 'zh', 'b': 'b', 'k': 'k', 'r': 'r', 'w': 'w', 'dh': 'dh', 'z': 'z', 'er': 'er'}[source]
Phones61 = dict_keys(['aa', 'kcl', 'v', 'ow', 'uh', 'm', 'jh', 'ah', 'd', 'p', 'ay', 'l', 'dx', 'ax', 'ux', 'bcl', 'th', 'tcl', 'gcl', 'ch', 'eh', 'y', 'en', 'oy', 'hh', 'pau', 't', 'uw', 'ax-h', 'pcl', 'f', 'ix', 'q', 'g', 'aw', 'n', 'nx', 'ao', 's', 'el', 'hv', 'axr', 'sh', 'ae', 'ih', 'epi', 'eng', 'ng', 'h#', 'dcl', 'em', 'ey', 'iy', 'zh', 'b', 'k', 'r', 'w', 'dh', 'z', 'er'])[source]
PhoneMapTo61 = {'hv': 'hv', 'aa': 'aa', 'kcl': 'kcl', 'ow': 'ow', 'uh': 'uh', 'sh': 'sh', 'ah': 'ah', 'h#': 'h#', 'dh': 'dh', 'd': 'd', 'ay': 'ay', 'l': 'l', 'dx': 'dx', 'ax': 'ax', 'ux': 'ux', 'th': 'th', 'tcl': 'tcl', 'gcl': 'gcl', 'ch': 'ch', 'r': 'r', 'eh': 'eh', 'y': 'y', 'en': 'en', 'oy': 'oy', 'pcl': 'pcl', 't': 't', 'g': 'g', 'ax-h': 'ax-h', 'f': 'f', 'ix': 'ix', 'q': 'q', 'uw': 'uw', 'v': 'v', 'n': 'n', 'nx': 'nx', 'hh': 'hh', 'el': 'el', 'epi': 'epi', 'm': 'm', 'p': 'p', 'ae': 'ae', 'ih': 'ih', 'ao': 'ao', 'eng': 'eng', 'ng': 'ng', 'aw': 'aw', 'jh': 'jh', 'dcl': 'dcl', 'em': 'em', 'ey': 'ey', 'iy': 'iy', 'axr': 'axr', 'zh': 'zh', 'er': 'er', 'k': 'k', 's': 's', 'w': 'w', 'bcl': 'bcl', 'z': 'z', 'b': 'b', 'pau': 'pau'}[source]
classmethod get_label_map(source_num_phones=61, target_num_phones=39)[source]
Parameters:
  • source_num_phones (int) –
  • target_num_phones (int) –
Return type:

dict[int,int|None]

init_seq_order(epoch=None, seq_list=None)[source]
class GeneratingDataset.NltkTimitDataset(nltk_download_dir=None, **kwargs)[source]

DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

This Dataset will get TIMIT via NLTK. Demo:

tools/dump-dataset.py “{‘class’: ‘NltkTimitDataset’}” tools/dump-dataset.py “{‘class’: ‘NltkTimitDataset’, ‘demo_play_audio’: True, ‘random_permute_audio’: True}”

Note: The NLTK data only contains a subset of the train data (160 utterances), and none of the test data. The full train data has 3696 utterances and the core test data has 192 utterances. Not sure how useful this is…

class GeneratingDataset.BytePairEncoding(vocab_file, bpe_file, seq_postfix=None, unknown_label='UNK')[source]

Code is partly taken from subword-nmt/apply_bpe.py. Author: Rico Sennrich, code under MIT license.

Use operations learned with learn_bpe.py to encode a new text. The text will not be smaller, but use only a fixed vocabulary, with rare words encoded as variable-length sequences of subword units.

Reference: Rico Sennrich, Barry Haddow and Alexandra Birch (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany.

Parameters:
  • vocab_file (str) –
  • bpe_file (str) –
  • seq_postfix (list[int]|None) – labels will be added to the seq in self.get_seq
  • unknown_label (str) –
get_seq(sentence)[source]
Parameters:sentence (str) –
Return type:list[int]
class GeneratingDataset.BlissDataset(path, vocab_file, bpe_file=None, num_feature_filters=40, feature_window_len=0.025, feature_step_len=0.01, with_delta=False, norm_mean=None, norm_std_dev=None, **kwargs)[source]

Reads in a Bliss XML corpus (similar as LmDataset), and provides the features (similar as TimitDataset) and the orthography as words, subwords or chars (similar as TranslationDataset).

Example:
./tools/dump-dataset.py “
{‘class’:’BlissDataset’,
‘path’: ‘/u/tuske/work/ASR/switchboard/corpus/xml/train.corpus.gz’, ‘bpe_file’: ‘/u/zeyer/setups/switchboard/subwords/swb-bpe-codes’, ‘vocab_file’: ‘/u/zeyer/setups/switchboard/subwords/swb-vocab’}”
Parameters:
  • path (str) – path to XML. can also be gzipped.
  • vocab_file (str) – path to vocabulary file. Python-str which evals to dict[str,int]
  • bpe_file (str) – Byte-pair encoding file
  • num_feature_filters (int) – e.g. number of MFCCs
  • with_delta (bool|int) – whether to add delta features (doubles the features dim). if int, up to this degree
class SeqInfo[source]
audio_end[source]
audio_path[source]
audio_start[source]
idx[source]
orth_raw[source]
orth_seq[source]
tag[source]
init_seq_order(epoch=None, seq_list=None)[source]
Parameters:
  • epoch (int|None) –
  • | None seq_list (list[str]) – In case we want to set a predefined order.
Return type:

bool

:returns whether the order changed (True is always safe to return)

class GeneratingDataset.LibriSpeechCorpus(path, prefix, bpe, audio, partition_epoch=None, fixed_random_seed=None, fixed_random_subset=None, **kwargs)[source]
Parameters:
  • path (str) – dir, should contain “train-///{.flac,*.trans.txt}”
  • prefix (str) – e.g. “train”
  • bpe (dict[str]) – options for BytePairEncoding
  • audio (dict[str]) – options for ExtractAudioFeatures
  • partition_epoch (int|None) –
  • fixed_random_seed (int|None) – for the shuffling, e.g. for seq_ordering=’random’. otherwise epoch will be used
  • fixed_random_subset (float|int|None) – Value in [0,1] to specify the fraction, or integer >=1 which specifies number of seqs. If given, will use this random subset. This will be applied initially at loading time, i.e. not dependent on the epoch. It will use an internally hardcoded fixed random seed, i.e. its deterministic.
init_seq_order(epoch=None, seq_list=None)[source]

If random_shuffle_epoch1, for epoch 1 with “random” ordering, we leave the given order as is. Otherwise, this is mostly the default behavior.

Parameters:
  • epoch (int|None) –
  • seq_list (list[str]|None) – In case we want to set a predefined order.
Return type:

bool

:returns whether the order changed (True is always safe to return)

get_tag(seq_idx)[source]
Parameters:seq_idx (int) –
Return type:str
GeneratingDataset.demo()[source]