returnn.frontend.audio.mel
¶
Mel filterbank.
- returnn.frontend.audio.mel.mel_filterbank(x: Tensor, *, in_dim: Dim, out_dim: Dim, sampling_rate: int | float, fft_length: int | None = None, f_min: None | int | float = None, f_max: None | int | float = None)[source]¶
Applies the Mel filterbank to the input.
- Parameters:
x
in_dim – expected to be fft_length // 2 + 1. E.g. via
stft()
.out_dim – nr of mel filters.
sampling_rate
fft_length – fft_size, n_fft. Should match fft_length from
stft()
. If not given, infer this from in_dim, as (in_dim - 1) * 2.f_min
f_max
- Returns:
- returnn.frontend.audio.mel.log_mel_filterbank_from_raw(raw_audio: Tensor, *, in_spatial_dim: Dim, out_dim: Dim, sampling_rate: int = 16000, window_len: float = 0.025, step_len: float = 0.01, n_fft: int | None = None, log_base: int | float = 10) Tuple[Tensor, Dim] [source]¶
log mel filterbank features
- Parameters:
raw_audio – (…, in_spatial_dim, …). if it has a feature_dim with dimension 1, it is squeezed away.
in_spatial_dim
out_dim – nr of mel filters.
sampling_rate – samples per second
window_len – in seconds
step_len – in seconds
n_fft – fft_size, n_fft. Should match fft_length from
stft()
. If not provided, next power-of-two from window_num_frames.log_base – e.g. 10 or math.e