`returnn.frontend.audio.mel`¶

Mel filterbank.

Applies the Mel filterbank to the input.

Parameters:

x
in_dim – expected to be fft_length // 2 + 1. E.g. via stft().
out_dim – nr of mel filters.
sampling_rate
fft_length – fft_size, n_fft. Should match fft_length from stft(). If not given, infer this from in_dim, as (in_dim - 1) * 2.
f_min
f_max

Returns:

returnn.frontend.audio.mel.log_mel_filterbank_from_raw(raw_audio: Tensor, *, in_spatial_dim: Dim, out_dim: Dim, sampling_rate: int = 16000, window_len: float = 0.025, step_len: float = 0.01, n_fft: int | None = None, log_base: int | float = 10) → Tuple[Tensor, Dim][source]¶

log mel filterbank features

Parameters:

raw_audio – (…, in_spatial_dim, …). if it has a feature_dim with dimension 1, it is squeezed away.
in_spatial_dim
out_dim – nr of mel filters.
sampling_rate – samples per second
window_len – in seconds
step_len – in seconds
n_fft – fft_size, n_fft. Should match fft_length from stft(). If not provided, next power-of-two from window_num_frames.
log_base – e.g. 10 or math.e

returnn.frontend.audio.mel¶

`returnn.frontend.audio.mel`¶