StereoDataset

This file contains dataset implementations to have an easy to use interface for using RETURNN for regression. Applications are for example speech enhancement or mask estimations

class StereoDataset.StereoDataset(**kwargs)[source]

The purpose of this dataset is to be a base dataset for datasets which have an easy to use interface for using RETURNN as a regression tool

constructor

num_seqs[source]

returns the number of sequences of the dataset

Return type:int
init_seq_order(epoch=None, seq_list=None)[source]
Parameters:
  • epoch (int|None) – epoch number
  • seq_list (list[str] | None seq_list: In case we want to set a predefined order.) – only None is currently supported
Initialize lists:
self.seq_index # sorted seq idx
class StereoDataset.StereoHdfDataset(hdfFile, num_outputs=None, normalizationFile=None, flag_normalizeInputs=True, flag_normalizeTargets=True, **kwargs)[source]

A stereo dataset which needs an hdf file as input. The hdf file is supposed to always have group ‘inputs’ and for the training data it also needs to contain the group ‘outputs’. Each group is supposed to contain one dataset per sequence. The names of the datasets are supposed to be consecutive numbers starting at 0.

The datasets are 2D numpy arrays, where dimension 0 is the time axis and dimension 1 is the feature axis. Therefore dimension 0 of the ‘input’ dataset and the respective ‘output’ dataset need to be the same.

Constructor

Parameters:
  • hdfFile (str) – path to the hdf file. if a bundle file is given (*.bundle) all hdf files listed in the bundle file will be used for the dataset. :see: BundleFile.BundleFile
  • num_outputs (int) – this needs to be set if the stereo data hdf file only contains ‘inputs’ data (e.g. for the extraction process). Only if no ‘outputs’ data exists in the hdf file num_outputs is used.
  • normalizationFile (str | None) – path to a HDF file with normalization data. The file is optional: if it is not provided then no normalization is performed. :see: NormalizationData.NormalizationData
  • flag_normalizeInputs (bool) – if True then inputs will be normalized provided that the normalization HDF file has necessary datasets (i.e. mean and variance)
  • flag_normalizeTargets (bool) – if True then targets will be normalized provided that the normalization HDF file has necessary datasets (i.e. mean and variance)
get_data_dim(key)[source]

This is copied from CachedDataset2 but the assertion is removed (see CachedDataset2.py)

Return type:int
Returns:number of classes, no matter if sparse or not
num_seqs[source]

Returns the number of sequences of the dataset

Return type:int
Returns:the number of sequences of the dataset.
class StereoDataset.DatasetWithTimeContext(hdfFile, tau=1, **kwargs)[source]

This dataset composes a context feature by stacking together time frames.

Constructor

Parameters:
  • hdfFile (string) – see the StereoHdfDataset
  • tau (int) – how many time frames should be on the left and on the right. E.g. if tau = 2 then the context feature will be created by stacking two neighboring time frames from left and two neighboring time frames from right: newInputFeature = [ x_{t-2}, x_{t-1}, x_t, x_{t+1}, x_{t+2} ]. In general new feature will have shape (2 * tau + 1) * originalFeatureDimensionality Output features are not changed.
  • kwargs (dictionary) – the rest of the arguments passed to the StereoHdfDataset