returnn.frontend.audio.mel

Mel filterbank.

returnn.frontend.audio.mel.mel_filterbank(x: Tensor, *, in_dim: Dim, out_dim: Dim, sampling_rate: int | float, fft_length: int | None = None, f_min: None | int | float = None, f_max: None | int | float = None)[source]

Applies the Mel filterbank to the input.

Parameters:
  • x

  • in_dim – expected to be fft_length // 2 + 1. E.g. via stft().

  • out_dim – nr of mel filters.

  • sampling_rate

  • fft_length – fft_size, n_fft. Should match fft_length from stft(). If not given, infer this from in_dim, as (in_dim - 1) * 2.

  • f_min

  • f_max

Returns:

returnn.frontend.audio.mel.log_mel_filterbank_from_raw(raw_audio: Tensor, *, in_spatial_dim: Dim, out_dim: Dim, sampling_rate: int = 16000, window_len: float = 0.025, step_len: float = 0.01, n_fft: int | None = None, log_base: int | float = 10) Tuple[Tensor, Dim][source]

log mel filterbank features

Parameters:
  • raw_audio – (…, in_spatial_dim, …). if it has a feature_dim with dimension 1, it is squeezed away.

  • in_spatial_dim

  • out_dim – nr of mel filters.

  • sampling_rate – samples per second

  • window_len – in seconds

  • step_len – in seconds

  • n_fft – fft_size, n_fft. Should match fft_length from stft(). If not provided, next power-of-two from window_num_frames.

  • log_base – e.g. 10 or math.e