returnn.frontend.audio.specaugment

SpecAugment, https://arxiv.org/abs/1904.08779

returnn.frontend.audio.specaugment.specaugment(x: Tensor, *, spatial_dim: Dim, feature_dim: Dim | None = None, global_train_step_dependent: bool = True, only_on_train: bool = True, max_consecutive_spatial_dims: int = 20, max_consecutive_feature_dims: int | None = None, num_spatial_mask_factor: int = 100, steps: Tuple[int, int, int] = (0, 1000, 2000)) Tensor[source]

SpecAugment, https://arxiv.org/abs/1904.08779

returnn.frontend.audio.specaugment.random_mask(x: Tensor, *, mask_axis: Dim, broadcast_axis: Dim | Collection[Dim], min_num: int | Tensor, max_num: int | Tensor, max_dims: int | Tensor, mask_value: int | float | Tensor = 0.0) Tensor[source]
Parameters:
  • x – (batch,time,feature)

  • mask_axis – axis to mask

  • broadcast_axis – one or multiple, which should be broadcasted over. The remaining axes not specified by mask_axis and broadcast_axis are not broadcasted over and treated as batch dims. E.g. in [B,T,D], with mask_axis=F, broadcast_axis=T, it creates masks [B,F].

  • min_num

  • max_num – inclusive

  • max_dims – inclusive

  • mask_value

returnn.frontend.audio.specaugment.mask(x: Tensor, *, mask_axis: Dim, pos: Tensor, max_amount: int | Tensor, mask_value: int | float | Tensor = 0.0) Tensor[source]
Parameters:
  • x – (batch,time,[feature]). any dim not mask_axis or in pos.shape will be broadcasted over

  • mask_axis

  • pos – (batch,) (or multiple batch dims)

  • max_amount – inclusive

  • mask_value