- returnn.frontend.dropout.dropout(source: Tensor, drop_prob: float | Tensor, *, axis: Dim | Sequence[Dim] | bool | None = None, on_forward: bool = False) Tensor #
Dropout will only be applied during training (unless you set on_forward=True).
When dropout is applied, the output will be scaled by 1/dropout.
drop_prob – 0.0 means to apply no dropout. 100% would mask everything. For every value in the tensor, the probability of it being dropped is drawn independently given this probability. The broadcasted axes are those not specified in
axis – axis to apply dropout on. multiple axes can be specified. This defines the set of axes where the dropout mask is not broadcasted to. If None (default), it will not broadcast on any axis. False is the same as None, and allows to write
axis=use_dropout_broadcast and ...feature_dim. (RETURNN also has the
noise_shapeoption but the
axisoption provides the same functionality.)
on_forward – apply dropout during inference and training (so just always). otherwise only during training.
- returnn.frontend.dropout.dropout_broadcast_default() bool #
Check the global RETURNN config whether we should broadcast on non-related dropout dimensions.
Historically in RETURNN, when we did dropout in the feature dimension, we broadcasted the dropout mask over the other dimensions (e.g. time and batch).
This function provides an easy global config controllable way to control this, via the option
The default for now: keep same as historical RETURNN, unless we find that this is really not a good idea. Then we might change the default via a new behavior version.
Also see the option
rf_att_dropout_broadcast, which does the same for attention dropout. Although the default for attention dropout broadcasting was already changed with behavior version 19.