GeneratingDataset

class GeneratingDataset.CopyTaskDataset(nsymbols, minlen=0, maxlen=0, minlen_epoch_factor=0, maxlen_epoch_factor=0, **kwargs)[source]
generate_seq(seq_idx)[source]
Return type:DatasetSeq
get_random_seq_len()[source]
class GeneratingDataset.DummyDataset(input_dim, output_dim, num_seqs, seq_len=2, input_max_value=10.0, input_shift=None, input_scale=None, **kwargs)[source]
generate_seq(seq_idx)[source]
class GeneratingDataset.GeneratingDataset(input_dim, output_dim, num_seqs=inf, fixed_random_seed=None, **kwargs)[source]
Parameters:
  • input_dim (int) –
  • output_dim (int|dict[str,int|(int,int)|dict]) –
  • num_seqs (int|float) –
  • fixed_random_seed (int) –
generate_seq(seq_idx)[source]
Return type:DatasetSeq
get_ctc_targets(sorted_seq_idx)[source]
get_input_data(sorted_seq_idx)[source]
get_num_timesteps()[source]
get_seq_length(sorted_seq_idx)[source]
get_tag(sorted_seq_idx)[source]
get_targets(target, sorted_seq_idx)[source]
init_seq_order(epoch=None, seq_list=None)[source]
Parameters:seq_list – predefined order. doesn’t make sense here

This is called when we start a new epoch, or at initialization.

is_cached(start, end)[source]
num_seqs[source]
class GeneratingDataset.NltkTimitDataset(nltk_download_dir=None, **kwargs)[source]

DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

This Dataset will get TIMIT via NLTK. Demo:

tools/dump-dataset.py “{‘class’: ‘NltkTimitDataset’}” tools/dump-dataset.py “{‘class’: ‘NltkTimitDataset’, ‘demo_play_audio’: True, ‘random_permute_audio’: True}”

Note: The NLTK data only contains a subset of the train data (160 utterances), and none of the test data. The full train data has 3696 utterances and the core test data has 192 utterances. Not sure how useful this is...

class GeneratingDataset.StaticDataset(data, target_list=None, output_dim=None, input_dim=None, **kwargs)[source]
generate_seq(seq_idx)[source]
get_target_list()[source]
class GeneratingDataset.Task12AXDataset(**kwargs)[source]

12AX memory task. This is a simple memory task where there is an outer loop and an inner loop. Description here: http://psych.colorado.edu/~oreilly/pubs-abstr.html#OReillyFrank06

estimate_output_class_priors(num_trials, seq_len=10)[source]
Return type:(float, float)
generate_input_seq(seq_len)[source]

Somewhat made up probability distribution. Try to make in a way that at least some “R” will occur in the output seq. Otherwise, “R”s are really rare.

generate_seq(seq_idx)[source]
get_random_seq_len()[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
class GeneratingDataset.TaskEpisodicCopyDataset(**kwargs)[source]

Episodic Copy memory task. This is a simple memory task where we need to remember a sequence. Described in: http://arxiv.org/abs/1511.06464 Also tested for Associative LSTMs. This is a variant where the lengths are random, both for the chars and for blanks.

generate_input_seq()[source]
generate_seq(seq_idx)[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
class GeneratingDataset.TaskVariableAssignmentDataset(**kwargs)[source]

Variable Assignment memory task. This is a memory task to test for key-value retrieval. Defined in Associative LSTM paper.

generate_input_seq()[source]
generate_seq(seq_idx)[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
class GeneratingDataset.TaskXmlModelingDataset(limit_stack_depth=4, **kwargs)[source]

XML modeling memory task. This is a memory task where we need to remember a stack. Defined in Jozefowicz et al. (2015). Also tested for Associative LSTMs.

generate_input_seq()[source]
generate_seq(seq_idx)[source]
classmethod make_output_seq(input_seq)[source]
Return type:list[int]
class GeneratingDataset.TimitDataset(timit_dir, train=True, preload=False, num_feature_filters=40, feature_window_len=0.025, feature_step_len=0.01, with_delta=False, norm_mean=None, norm_std_dev=None, random_permute_audio=None, num_phones=61, demo_play_audio=False, fixed_random_seed=None, **kwargs)[source]

DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. You must provide the data.

Demo:

tools/dump-dataset.py “{‘class’: ‘TimitDataset’, ‘timit_dir’: ‘...’}” tools/dump-dataset.py “{‘class’: ‘TimitDataset’, ‘timit_dir’: ‘...’, ‘demo_play_audio’: True, ‘random_permute_audio’: True}”

The full train data has 3696 utterances and the core test data has 192 utterances (24-speaker core test set).

For some references: https://github.com/ppwwyyxx/tensorpack/blob/master/examples/CTC-TIMIT/train-timit.py https://www.cs.toronto.edu/~graves/preprint.pdf https://arxiv.org/pdf/1303.5778.pdf https://arxiv.org/pdf/0804.3269.pdf

Parameters:
  • timit_dir (str) – directory of TIMIT. should contain train/filelist.phn and test/filelist.core.phn
  • train (bool) – whether to use the train or core test data
  • preload (bool) – if True, here at __init__, we will wait until we loaded all the data
  • num_feature_filters (int) – e.g. number of MFCCs
  • with_delta (bool|int) – whether to add delta features (doubles the features dim). if int, up to this degree
  • norm_mean (str) – file with mean values which are used for mean-normalization of the final features
  • norm_std_dev (str) – file with std dev valeus for variance-normalization of the final features
  • random_permute_audio (None|bool|dict[str]) – enables permutation on the audio. see _get_random_permuted_audio
  • num_phones (int) – 39, 48 or 61. num labels of our classes
  • demo_play_audio (bool) – plays the audio. only make sense with tools/dump-dataset.py
  • fixed_random_seed (None|int) – if given, use this fixed random seed in every epoch
PhoneMapTo39 = {'em': 'm', 'ch': 'ch', 'ix': 'ih', 'tcl': 'sil', 'h#': 'sil', 'iy': 'iy', 'pcl': 'sil', 'axr': 'er', 'zh': 'sh', 'th': 'th', 'dh': 'dh', 'kcl': 'sil', 'hv': 'hh', 'hh': 'hh', 'dx': 'dx', 'ax-h': 'ah', 'ux': 'uw', 'b': 'b', 'd': 'd', 'f': 'f', 'uw': 'uw', 'l': 'l', 'n': 'n', 'p': 'p', 'r': 'r', 't': 't', 'v': 'v', 'z': 'z', 'aa': 'aa', 'el': 'l', 'en': 'n', 'ae': 'ae', 'eh': 'eh', 'ah': 'ah', 'ao': 'aa', 'ih': 'ih', 'ey': 'ey', 'aw': 'aw', 'ay': 'ay', 'ax': 'ah', 'er': 'er', 'pau': 'sil', 'eng': 'ng', 'gcl': 'sil', 'ng': 'ng', 'nx': 'n', 'uh': 'uh', 'dcl': 'sil', 'w': 'w', 'y': 'y', 'jh': 'jh', 'bcl': 'sil', 'g': 'g', 'k': 'k', 'm': 'm', 'q': None, 's': 's', 'sh': 'sh', 'oy': 'oy', 'epi': 'sil', 'ow': 'ow'}[source]
PhoneMapTo48 = {'em': 'm', 'ch': 'ch', 'ix': 'ix', 'tcl': 'cl', 'h#': 'sil', 'iy': 'iy', 'pcl': 'cl', 'axr': 'er', 'zh': 'zh', 'th': 'th', 'dh': 'dh', 'kcl': 'cl', 'hv': 'hh', 'hh': 'hh', 'dx': 'dx', 'ax-h': 'ax', 'ux': 'uw', 'b': 'b', 'd': 'd', 'f': 'f', 'uw': 'uw', 'l': 'l', 'n': 'n', 'p': 'p', 'r': 'r', 't': 't', 'v': 'v', 'z': 'z', 'aa': 'aa', 'el': 'el', 'en': 'en', 'ae': 'ae', 'eh': 'eh', 'ah': 'ah', 'ao': 'ao', 'ih': 'ih', 'ey': 'ey', 'aw': 'aw', 'ay': 'ay', 'ax': 'ax', 'er': 'er', 'pau': 'sil', 'eng': 'ng', 'gcl': 'vcl', 'ng': 'ng', 'nx': 'n', 'uh': 'uh', 'dcl': 'vcl', 'w': 'w', 'y': 'y', 'jh': 'jh', 'bcl': 'vcl', 'g': 'g', 'k': 'k', 'm': 'm', 'q': None, 's': 's', 'sh': 'sh', 'oy': 'oy', 'epi': 'epi', 'ow': 'ow'}[source]
PhoneMapTo61 = {'em': 'em', 'ix': 'ix', 'aa': 'aa', 'ch': 'ch', 'zh': 'zh', 'eh': 'eh', 'el': 'el', 'ah': 'ah', 'ow': 'ow', 'ao': 'ao', 'ih': 'ih', 'tcl': 'tcl', 'en': 'en', 'ey': 'ey', 'aw': 'aw', 'ax': 'ax', 'ay': 'ay', 'h#': 'h#', 'er': 'er', 'pau': 'pau', 'eng': 'eng', 'gcl': 'gcl', 'ng': 'ng', 'nx': 'nx', 'iy': 'iy', 'sh': 'sh', 'pcl': 'pcl', 'uh': 'uh', 'bcl': 'bcl', 'dcl': 'dcl', 'th': 'th', 'dh': 'dh', 'kcl': 'kcl', 'epi': 'epi', 'hv': 'hv', 'oy': 'oy', 'hh': 'hh', 'jh': 'jh', 'dx': 'dx', 'ax-h': 'ax-h', 'ux': 'ux', 'axr': 'axr', 'b': 'b', 'd': 'd', 'g': 'g', 'f': 'f', 'uw': 'uw', 'm': 'm', 'l': 'l', 'n': 'n', 'q': 'q', 'p': 'p', 's': 's', 'r': 'r', 't': 't', 'w': 'w', 'v': 'v', 'y': 'y', 'ae': 'ae', 'z': 'z', 'k': 'k'}[source]
Phones61 = ['em', 'ch', 'ix', 'tcl', 'h#', 'iy', 'pcl', 'axr', 'zh', 'th', 'dh', 'kcl', 'hv', 'hh', 'dx', 'ax-h', 'ux', 'b', 'd', 'f', 'uw', 'l', 'n', 'p', 'r', 't', 'v', 'z', 'aa', 'el', 'en', 'ae', 'eh', 'ah', 'ao', 'ih', 'ey', 'aw', 'ay', 'ax', 'er', 'pau', 'eng', 'gcl', 'ng', 'nx', 'uh', 'dcl', 'w', 'y', 'jh', 'bcl', 'g', 'k', 'm', 'q', 's', 'sh', 'oy', 'epi', 'ow'][source]
classmethod get_label_map(source_num_phones=61, target_num_phones=39)[source]
Parameters:
  • source_num_phones (int) –
  • target_num_phones (int) –
Return type:

dict[int,int|None]

init_seq_order(epoch=None, seq_list=None)[source]
GeneratingDataset.demo()[source]