`returnn.frontend.audio.specaugment`¶

SpecAugment, https://arxiv.org/abs/1904.08779

returnn.frontend.audio.specaugment.specaugment(x: Tensor, *, spatial_dim: Dim, feature_dim: Dim | None = None, global_train_step_dependent: bool = True, only_on_train: bool = True, max_consecutive_spatial_dims: int = 20, max_consecutive_feature_dims: int | None = None, num_spatial_mask_factor: int = 100, steps: Tuple[int, int, int] = (0, 1000, 2000)) → Tensor[source]¶: SpecAugment, https://arxiv.org/abs/1904.08779

returnn.frontend.audio.specaugment.random_mask(x: Tensor, *, mask_axis: Dim, broadcast_axis: Dim | Collection[Dim], min_num: int | Tensor, max_num: int | Tensor, max_dims: int | Tensor, mask_value: int | float | Tensor = 0.0) → Tensor[source]¶

Parameters:

x – (batch,time,feature)
mask_axis – axis to mask
broadcast_axis – one or multiple, which should be broadcasted over. The remaining axes not specified by mask_axis and broadcast_axis are not broadcasted over and treated as batch dims. E.g. in [B,T,D], with mask_axis=F, broadcast_axis=T, it creates masks [B,F].
min_num
max_num – inclusive
max_dims – inclusive
mask_value

returnn.frontend.audio.specaugment.mask(x: Tensor, *, mask_axis: Dim, pos: Tensor, max_amount: int | Tensor, mask_value: int | float | Tensor = 0.0) → Tensor[source]¶

Parameters:

x – (batch,time,[feature]). any dim not mask_axis or in pos.shape will be broadcasted over
mask_axis
pos – (batch,) (or multiple batch dims)
max_amount – inclusive
mask_value