returnn.frontend.math_
¶
Math ops
- returnn.frontend.math_.compare(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor [source]¶
- Parameters:
a
kind – “equal”|”==”, “less”|”<”, “less_equal”|”<=”, “greater”|”>”, “greater_equal”|”>=”, “not_equal”|”!=”
b
allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.
dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.
- Returns:
element-wise comparison of a and b
- returnn.frontend.math_.compare_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor [source]¶
compare()
with allow_broadcast_all_sources=True
- returnn.frontend.math_.combine(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor [source]¶
- Parameters:
a
kind – “add”|”+”, “sub”|”-”, “mul”|”*”, “truediv”|”/”, “floordiv”|”//”, “mod”|”%”, “pow”|”**”, “max”|”maximum”, “min”|”minimum”, “logical_and”, “logical_or”, “squared_difference”
b
allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.
dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.
- Returns:
element-wise combination of a and b
- returnn.frontend.math_.combine_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor [source]¶
combine()
with allow_broadcast_all_sources=True
- returnn.frontend.math_.maximum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor [source]¶
- returnn.frontend.math_.minimum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor [source]¶
- returnn.frontend.math_.clip_by_value(x: Tensor, clip_value_min: Tensor | int | float | complex | number | ndarray | bool | str, clip_value_max: Tensor | int | float | complex | number | ndarray | bool | str, *, allow_broadcast_all_sources: bool = False) Tensor [source]¶
clip by value
- returnn.frontend.math_.identity(x: Tensor) Tensor [source]¶
Identity function. Just to have one canonical. Does nothing, returns the input.
- returnn.frontend.math_.silu(a: Tensor) Tensor [source]¶
silu / swish.
The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)
- returnn.frontend.math_.swish(a: Tensor) Tensor [source]¶
silu / swish.
The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)
- returnn.frontend.math_.gelu(a: Tensor) Tensor [source]¶
Compute the Gaussian Error Linear Unit (GELU) activation function. Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). Ref: [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415). This here is without the tanh-approximation.
- Parameters:
a
- returnn.frontend.math_.gating(x: ~returnn.tensor.tensor.Tensor, *, axis: ~returnn.tensor.dim.Dim | None = None, gate_func=<function sigmoid>, act_func=<function identity>, out_dim: ~returnn.tensor.dim.Dim | None = None) Tuple[Tensor, Dim] [source]¶
Like in gated linear unit (GLU): https://arxiv.org/abs/1612.08083 GLU refers also to the linear transformation before the gating – this is why this function is not called GLU. GLU uses gate_func=sigmoid and act_func=identity (the defaults here).
There are other potential gating variants you might be interested at. See for example: https://arxiv.org/abs/2002.05202, e.g. gate_func=gelu.
- returnn.frontend.math_.lerp(start: Tensor, end: Tensor, weight: float | Tensor, *, allow_broadcast_all_sources: bool = False) Tensor [source]¶
Linear interpolation between start and end. (Some backends might provide an optimized version of this.)
- Parameters:
start
end
weight – scalar or tensor
allow_broadcast_all_sources
- Returns:
start + weight * (end - start)