`returnn.util.sig_proc`¶

Collection of generic utilities related to signal processing

returnn.util.sig_proc.greenwood_function(x, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]¶

Greenwood function, convert fractional length to frequency, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990

Parameters:

x (float) – fractional length
scaling_constant (float) – A in [1]
constant_of_integration (float) – k in [1]
slope (float) – a in [1]

Returns:

frequency corresponding to given fractional length

Return type:

float

returnn.util.sig_proc.inv_greenwood_function(freq, scaling_constant=165.4, constant_of_integration=0.88, slope=2.1)[source]¶

Inverse greenwood function, convert frequency to fractional length, see Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007 and also https://en.wikipedia.org/wiki/Greenwood_function The default values are taken from RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc They correspond to the recommended values for human data according to Greenwood, Donald D. “A cochlear frequency‐position function for several species—29 years later.” The Journal of the Acoustical Society of America, 1990

Parameters:

freq (float) – frequency
scaling_constant (float) – A in [1]
constant_of_integration (float) – k in [1]
slope (float) – a in [1]

Returns:

fractional length corresponding to given frequency

Return type:

float

class returnn.util.sig_proc.GammatoneFilterbank(num_channels, length, sample_rate=16000, freq_max=7500.0, freq_min=100.0, normalization=True)[source]¶

Class representing a gammatone filterbank. Based on [1] Schlueter, Ralf, et al. “Gammatone features and feature combination for large vocabulary speech recognition.” ICASSP 2007

Parameters:

num_channels (int) – number of filters
length (int|float) – length of FIR filters in seconds
sample_rate (int) – sample rate of audio signal in Hz
freq_max (float) – maximum frequency of filterbank
freq_min (float) – minimum frequency of filterbank
normalization (bool) – normalize filterbanks to maximum frequency response of 0 dB

get_gammatone_filterbank()[source]¶

Returns an array with the parameters of the gammatone filterbank

Returns:: gammatone filterbank of shape (self.length * self.sample_rate, self.num_channels)
Return type:: numpy.array

static center_frequencies(num_channels, freq_max, freq_min)[source]¶

Determine center frequencies for gammatone filterbank

Parameters:

num_channels (int) – number of filters
freq_max (float) – maximum frequency of filterbank
freq_min (float) – minimum frequency of filterbank

Returns:

center frequencies

Return type:

numpy.array

static bandwidth_by_center_frequency(freq, lin_approx_coeff=24.7, quality_factor=9.264491981582191)[source]¶

Get bandwidth (named B in [1]) by center frequency using a linear approximation of the equivalent rectangular bandwidth (ERB) from Glasberg, Brian R., and Brian CJ Moore. “Derivation of auditory filter shapes from notched-noise data.” Hearing research, 1990 The default values are taken from there and are also used in RASR, see https://github.com/rwth-i6/rasr/blob/master/src/Signal/GammaTone.cc

Parameters:

freq (float) – center frequency
lin_approx_coeff (float) – coefficient for the linear approximation of the ERB
quality_factor (float) – audiological (ERB) based filter quality factor

Returns:

bandwidth

Return type:

float

gammatone_impulse_response(f_center, length, sample_rate, output_gain=1.0, filter_order=4, phase_shift=0.0)[source]¶

Compute gammatone impulse response based on [1]

Parameters:

f_center (float) – center frequency
length (int|float) – length of finite impulse response in seconds
sample_rate (int) – sample rate of audio signal in Hz
output_gain (float) – output gain, named k in [1]
filter_order (int) – order of filter, named n in [1]
phase_shift (float) – phase shift, named phi in [1]

Returns:

gammatone impulse response

Return type:

numpy.array

static normalize_filters(filters)[source]¶

Normalize filterbank such that the maximum frequency response is 0 dB

Parameters:: filters (numpy.array) – filterbank with shape number_channels x filter_length
Returns:: normalized filterbank
Return type:: numpy.array

returnn.util.sig_proc¶

`returnn.util.sig_proc`¶