Mel filterbank. Tensor, *, in_dim: Dim, out_dim: Dim, sampling_rate: int | float, fft_length: int | None = None, f_min: None | int | float = None, f_max: None | int | float = None)[source]

Applies the Mel filterbank to the input.

  • x

  • in_dim – expected to be fft_length // 2 + 1. E.g. via stft().

  • out_dim – nr of mel filters.

  • sampling_rate

  • fft_length – fft_size, n_fft. Should match fft_length from stft(). If not given, infer this from in_dim, as (in_dim - 1) * 2.

  • f_min

  • f_max

Returns: Tensor, *, in_spatial_dim: Dim, out_dim: Dim, sampling_rate: int = 16000, window_len: float = 0.025, step_len: float = 0.01, n_fft: int | None = None, log_base: int | float = 10) Tuple[Tensor, Dim][source]

log mel filterbank features

  • raw_audio – (…, in_spatial_dim, …). if it has a feature_dim with dimension 1, it is squeezed away.

  • in_spatial_dim

  • out_dim – nr of mel filters.

  • sampling_rate – samples per second

  • window_len – in seconds

  • step_len – in seconds

  • n_fft – fft_size, n_fft. Should match fft_length from stft(). If not provided, next power-of-two from window_num_frames.

  • log_base – e.g. 10 or math.e