returnn.util.sig_proc
#
Collection of generic utilities related to signal processing
- returnn.util.sig_proc.greenwood_function(x, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]#
Greenwood function, convert fractional length to frequency, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990
- Parameters:
x (float) – fractional length
scaling_constant (float) – A in [1]
constant_of_integration (float) – k in [1]
slope (float) – a in [1]
- Returns:
frequency corresponding to given fractional length
- Return type:
float
- returnn.util.sig_proc.inv_greenwood_function(freq, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]#
Inverse greenwood function, convert frequency to fractional length, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990
- Parameters:
freq (float) – frequency
scaling_constant (float) – A in [1]
constant_of_integration (float) – k in [1]
slope (float) – a in [1]
- Returns:
fractional length corresponding to given frequency
- Return type:
float
- class returnn.util.sig_proc.GammatoneFilterbank(num_channels, length, sample_rate=16000, freq_max=7500.0, freq_min=100.0, normalization=True)[source]#
Class representing a gammatone filterbank. Based on [1] Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007
- Parameters:
num_channels (int) – number of filters
length (int|float) – length of FIR filters in seconds
sample_rate (int) – sample rate of audio signal in Hz
freq_max (float) – maximum frequency of filterbank
freq_min (float) – minimum frequency of filterbank
normalization (bool) – normalize filterbanks to maximum frequency response of 0 dB
- get_gammatone_filterbank()[source]#
Returns an array with the parameters of the gammatone filterbank
- Returns:
gammatone filterbank of shape (self.length * self.sample_rate, self.num_channels)
- Return type:
numpy.array
- static center_frequencies(num_channels, freq_max, freq_min)[source]#
Determine center frequencies for gammatone filterbank
- Parameters:
num_channels (int) – number of filters
freq_max (float) – maximum frequency of filterbank
freq_min (float) – minimum frequency of filterbank
- Returns:
center frequencies
- Return type:
numpy.array
- static bandwidth_by_center_frequency(freq, lin_approx_coeff=24.7, quality_factor=9.264491981582191)[source]#
Get bandwidth (named B in [1]) by center frequency using a linear approximation of the equivalent rectangular bandwidth (ERB) from Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research, 1990 The default values are taken from there and are also used in RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc
- Parameters:
freq (float) – center frequency
lin_approx_coeff (float) – coefficient for the linear approximation of the ERB
quality_factor (float) – audiological (ERB) based filter quality factor
- Returns:
bandwidth
- Return type:
float
- gammatone_impulse_response(f_center, length, sample_rate, output_gain=1.0, filter_order=4, phase_shift=0.0)[source]#
Compute gammatone impulse response based on [1]
- Parameters:
f_center (float) – center frequency
length (int|float) – length of finite impulse response in seconds
sample_rate (int) – sample rate of audio signal in Hz
output_gain (float) – output gain, named k in [1]
filter_order (int) – order of filter, named n in [1]
phase_shift (float) – phase shift, named phi in [1]
- Returns:
gammatone impulse response
- Return type:
numpy.array