returnn.datasets.normalization_data

class returnn.datasets.normalization_data.NormalizationData(normalizationFilePath)[source]

This class holds normalization data for inputs and outputs. It also contains methods to create the normalization HDF file.

Reads normalization data from the given HDF file and saves it into the member variables.

Parameters:

normalizationFilePath (str) – path to the HDF file with normalization data.

GROUP_INPUTS = 'inputs'[source]
GROUP_OUTPUTS = 'outputs'[source]
DATASET_MEAN = 'mean'[source]
DATASET_MEAN_OF_SQUARES = 'meanOfSquares'[source]
DATASET_VARIANCE = 'variance'[source]
DATASET_TOTAL_FRAMES = 'totalNumberOfFrames'[source]
DATASET_TIME_DIMENSION_INDEX = 0[source]
DATASET_FEATURE_DIMENSION_INDEX = 1[source]
SUMMATION_PRECISION = 1e-05[source]
static createNormalizationFile(bundleFilePath, outputFilePath, dtype=<class 'numpy.float64'>, flag_includeOutputs=True)[source]

Calculates means over inputs and outputs of datasets in the HDF files described by the given bundle file.

See:

BundleFile.BundleFile

Each HDF dataset file is expected to have the following groups:

  • NormalizationData.GROUP_INPUTS (the group for the input data)

  • NormalizationData.GROUP_OUTPUTS (the group for the output data)

Each group may have datasets. Each dataset is expected to have shape (time frames, features). E.g. (267, 513) – 267 time frames each containing a feature vector of dimensionality 513.

The method writes results into the given output file. Availability of means and variances depends on whether the corresponding groups are available in the input dataset HDF files.

!!! IMPORTANT !!! General rule of thumb: if one dataset file has both input and output groups then you should make sure that all the dataset files have them. Otherwise means and variance will not be correct. It is OK if all the datasets have only the input group. In this case means and variance only for inputs will be calculated.

Parameters:
  • bundleFilePath (str) – path to the bundle file. :see: BundleFile.BundleFile

  • outputFilePath (str) – path to the output HDF normalization file.

  • dtype (numpy.dtype) – type of data to use during calculations.

  • flag_includeOutputs (bool) – if True then normalization data will be calculated for outputs (targets) as well.

property inputMean[source]

Mean of the input data.

Return type:

numpy.ndarray | None

Returns:

Mean of the input data if it is available or None otherwise.

property inputVariance[source]

Variance of the input data.

Return type:

numpy.ndarray | None

Returns:

Variance of the input data if it is available or None otherwise.

property outputMean[source]

Mean of the output data.

Return type:

numpy.ndarray | None

Returns:

Mean of the output data if it is available or None otherwise.

property outputVariance[source]

Variance of the output data.

Return type:

numpy.ndarray | None

Returns:

Variance of the output data if it is available or None otherwise.