returnn.frontend.math_
#
Math ops
- returnn.frontend.math_.compare(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor [source]#
- Parameters:
a –
kind – “equal”|”==”, “less”|”<”, “less_equal”|”<=”, “greater”|”>”, “greater_equal”|”>=”, “not_equal”|”!=”
b –
allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.
dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.
- Returns:
element-wise comparison of a and b
- returnn.frontend.math_.compare_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor [source]#
compare()
with allow_broadcast_all_sources=True
- returnn.frontend.math_.combine(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor [source]#
- Parameters:
a –
kind – “add”|”+”, “sub”|”-”, “mul”|”*”, “truediv”|”/”, “floordiv”|”//”, “mod”|”%”, “pow”|”**”, “max”|”maximum”, “min”|”minimum”, “logical_and”, “logical_or”, “squared_difference”
b –
allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.
dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.
- Returns:
element-wise combination of a and b
- returnn.frontend.math_.combine_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor [source]#
combine()
with allow_broadcast_all_sources=True
- returnn.frontend.math_.maximum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor [source]#
- returnn.frontend.math_.minimum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor [source]#
- returnn.frontend.math_.identity(x: Tensor) Tensor [source]#
Identity function. Just to have one canonical. Does nothing, returns the input.
- returnn.frontend.math_.silu(a: Tensor) Tensor [source]#
silu / swish.
The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)
- returnn.frontend.math_.swish(a: Tensor) Tensor [source]#
silu / swish.
The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)
- returnn.frontend.math_.gating(x: ~returnn.tensor.tensor.Tensor, *, axis: ~returnn.tensor.dim.Dim | None = None, gate_func=<function sigmoid>, act_func=<function identity>, out_dim: ~returnn.tensor.dim.Dim | None = None) Tuple[Tensor, Dim] [source]#
Like in gated linear unit (GLU): https://arxiv.org/abs/1612.08083 GLU refers also to the linear transformation before the gating – this is why this function is not called GLU. GLU uses gate_func=sigmoid and act_func=identity (the defaults here).
There are other potential gating variants you might be interested at. See for example: https://arxiv.org/abs/2002.05202, e.g. gate_func=gelu.