returnn.frontend.math_

Math ops

returnn.frontend.math_.compare(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor[source]
Parameters:
  • a

  • kind – “equal”|”==”, “less”|”<”, “less_equal”|”<=”, “greater”|”>”, “greater_equal”|”>=”, “not_equal”|”!=”

  • b

  • allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.

  • dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.

Returns:

element-wise comparison of a and b

returnn.frontend.math_.compare_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor[source]

compare() with allow_broadcast_all_sources=True

returnn.frontend.math_.combine(a: Tensor, kind: str, b: Tensor, *, allow_broadcast_all_sources: bool | None = None, dim_order: Sequence[Dim] | None = None) Tensor[source]
Parameters:
  • a

  • kind – “add”|”+”, “sub”|”-”, “mul”|”*”, “truediv”|”/”, “floordiv”|”//”, “mod”|”%”, “pow”|”**”, “max”|”maximum”, “min”|”minimum”, “logical_and”, “logical_or”, “squared_difference”

  • b

  • allow_broadcast_all_sources – if True, it is allowed that neither a nor b has all dims of the result. Not needed when out_dims is specified explicitly.

  • dim_order – defines the order of the resulting dims. if None, it is automatically inferred from a and b. Not all the dims of a and b need to be specified here, and there could also be other dims in the dim_order.

Returns:

element-wise combination of a and b

returnn.frontend.math_.combine_bc(a: Tensor, kind: str, b: Tensor, *, dim_order: Sequence[Dim] | None = None) Tensor[source]

combine() with allow_broadcast_all_sources=True

returnn.frontend.math_.equal(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.not_equal(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.less(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.less_equal(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.greater(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.greater_equal(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.add(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.sub(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.mul(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.true_divide(a: Tensor, b: Tensor) Tensor[source]

truediv

returnn.frontend.math_.floor_divide(a: Tensor, b: Tensor) Tensor[source]

floordiv

returnn.frontend.math_.ceil_divide(a: Tensor, b: Tensor) Tensor[source]

ceildiv

returnn.frontend.math_.neg(a: Tensor) Tensor[source]
returnn.frontend.math_.reciprocal(a: Tensor) Tensor[source]

reciprocal / inverse, i.e. 1/a

returnn.frontend.math_.mod(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.pow(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.squared_difference(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.logical_and(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.logical_or(a: Tensor, b: Tensor) Tensor[source]
returnn.frontend.math_.logical_not(a: Tensor) Tensor[source]
returnn.frontend.math_.opt_logical_or(a: bool, b: bool) bool[source]

logical or

returnn.frontend.math_.maximum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor[source]
returnn.frontend.math_.minimum(a: Tensor, b: Tensor | int | float | complex | number | ndarray | bool | str, *other_tensors) Tensor[source]
returnn.frontend.math_.clip_by_value(x: Tensor, clip_value_min: Tensor | int | float | complex | number | ndarray | bool | str, clip_value_max: Tensor | int | float | complex | number | ndarray | bool | str, *, allow_broadcast_all_sources: bool = False) Tensor[source]

clip by value

returnn.frontend.math_.identity(x: Tensor) Tensor[source]

Identity function. Just to have one canonical. Does nothing, returns the input.

returnn.frontend.math_.exp(a: Tensor) Tensor[source]
returnn.frontend.math_.expm1(a: Tensor) Tensor[source]
returnn.frontend.math_.log(a: Tensor) Tensor[source]
returnn.frontend.math_.safe_log(a: Tensor, *, eps: float | None = None) Tensor[source]
returnn.frontend.math_.log1p(a: Tensor) Tensor[source]
returnn.frontend.math_.sqrt(a: Tensor) Tensor[source]
returnn.frontend.math_.rsqrt(a: Tensor) Tensor[source]
returnn.frontend.math_.square(a: Tensor) Tensor[source]
returnn.frontend.math_.abs(a: Tensor) Tensor[source]
returnn.frontend.math_.tanh(a: Tensor) Tensor[source]
returnn.frontend.math_.sigmoid(a: Tensor) Tensor[source]
returnn.frontend.math_.log_sigmoid(a: Tensor) Tensor[source]
returnn.frontend.math_.sin(a: Tensor) Tensor[source]
returnn.frontend.math_.cos(a: Tensor) Tensor[source]
returnn.frontend.math_.ceil(a: Tensor) Tensor[source]
returnn.frontend.math_.floor(a: Tensor) Tensor[source]
returnn.frontend.math_.round(a: Tensor) Tensor[source]
returnn.frontend.math_.relu(a: Tensor) Tensor[source]
returnn.frontend.math_.elu(a: Tensor) Tensor[source]
returnn.frontend.math_.selu(a: Tensor) Tensor[source]
returnn.frontend.math_.silu(a: Tensor) Tensor[source]

silu / swish.

The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)

returnn.frontend.math_.swish(a: Tensor) Tensor[source]

silu / swish.

The SiLU activation function was introduced in “Gaussian Error Linear Units (GELUs)” [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning” [Elfwing et al. 2017](https://arxiv.org/abs/1702.03118) and was independently discovered (and called swish) in “Searching for Activation Functions” [Ramachandran et al. 2017](https://arxiv.org/abs/1710.05941)

returnn.frontend.math_.gelu(a: Tensor) Tensor[source]

Compute the Gaussian Error Linear Unit (GELU) activation function. Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). Ref: [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415). This here is without the tanh-approximation.

Parameters:

a

returnn.frontend.math_.softmax(a: Tensor, *, axis: Dim, use_mask: bool = True) Tensor[source]
returnn.frontend.math_.log_softmax(a: Tensor, *, axis: Dim, use_mask: bool = True) Tensor[source]
returnn.frontend.math_.gating(x: ~returnn.tensor.tensor.Tensor, *, axis: ~returnn.tensor.dim.Dim | None = None, gate_func=<function sigmoid>, act_func=<function identity>, out_dim: ~returnn.tensor.dim.Dim | None = None) Tuple[Tensor, Dim][source]

Like in gated linear unit (GLU): https://arxiv.org/abs/1612.08083 GLU refers also to the linear transformation before the gating – this is why this function is not called GLU. GLU uses gate_func=sigmoid and act_func=identity (the defaults here).

There are other potential gating variants you might be interested at. See for example: https://arxiv.org/abs/2002.05202, e.g. gate_func=gelu.