returnn.util.sig_proc#

Collection of generic utilities related to signal processing

returnn.util.sig_proc.greenwood_function(x, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]#

Greenwood function, convert fractional length to frequency, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990

Parameters:
  • x (float) – fractional length

  • scaling_constant (float) – A in [1]

  • constant_of_integration (float) – k in [1]

  • slope (float) – a in [1]

Returns:

frequency corresponding to given fractional length

Return type:

float

returnn.util.sig_proc.inv_greenwood_function(freq, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]#

Inverse greenwood function, convert frequency to fractional length, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990

Parameters:
  • freq (float) – frequency

  • scaling_constant (float) – A in [1]

  • constant_of_integration (float) – k in [1]

  • slope (float) – a in [1]

Returns:

fractional length corresponding to given frequency

Return type:

float

class returnn.util.sig_proc.GammatoneFilterbank(num_channels, length, sample_rate=16000, freq_max=7500.0, freq_min=100.0, normalization=True)[source]#

Class representing a gammatone filterbank. Based on [1] Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007

Parameters:
  • num_channels (int) – number of filters

  • length (int|float) – length of FIR filters in seconds

  • sample_rate (int) – sample rate of audio signal in Hz

  • freq_max (float) – maximum frequency of filterbank

  • freq_min (float) – minimum frequency of filterbank

  • normalization (bool) – normalize filterbanks to maximum frequency response of 0 dB

get_gammatone_filterbank()[source]#

Returns an array with the parameters of the gammatone filterbank

Returns:

gammatone filterbank of shape (self.length * self.sample_rate, self.num_channels)

Return type:

numpy.array

static center_frequencies(num_channels, freq_max, freq_min)[source]#

Determine center frequencies for gammatone filterbank

Parameters:
  • num_channels (int) – number of filters

  • freq_max (float) – maximum frequency of filterbank

  • freq_min (float) – minimum frequency of filterbank

Returns:

center frequencies

Return type:

numpy.array

static bandwidth_by_center_frequency(freq, lin_approx_coeff=24.7, quality_factor=9.264491981582191)[source]#

Get bandwidth (named B in [1]) by center frequency using a linear approximation of the equivalent rectangular bandwidth (ERB) from Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research, 1990 The default values are taken from there and are also used in RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc

Parameters:
  • freq (float) – center frequency

  • lin_approx_coeff (float) – coefficient for the linear approximation of the ERB

  • quality_factor (float) – audiological (ERB) based filter quality factor

Returns:

bandwidth

Return type:

float

gammatone_impulse_response(f_center, length, sample_rate, output_gain=1.0, filter_order=4, phase_shift=0.0)[source]#

Compute gammatone impulse response based on [1]

Parameters:
  • f_center (float) – center frequency

  • length (int|float) – length of finite impulse response in seconds

  • sample_rate (int) – sample rate of audio signal in Hz

  • output_gain (float) – output gain, named k in [1]

  • filter_order (int) – order of filter, named n in [1]

  • phase_shift (float) – phase shift, named phi in [1]

Returns:

gammatone impulse response

Return type:

numpy.array

static normalize_filters(filters)[source]#

Normalize filterbank such that the maximum frequency response is 0 dB

Parameters:

filters (numpy.array) – filterbank with shape number_channels x filter_length

Returns:

normalized filterbank

Return type:

numpy.array