returnn.frontend.dims
#
Utilities for dimension tags, dimensions, axes.
- returnn.frontend.dims.range_over_dim(dim: Dim, *, dtype: str | None = None, device: str | None = None) Tensor[T] [source]#
- Parameters:
dim –
dtype –
:param device, :return: tensor with shape [dim]
- returnn.frontend.dims.range_over_dims(dims: Sequence[Dim], *, dtype: str | None = None, device: str | None = None) Tensor[T] [source]#
This is if you want to index into a merged dim. Related:
rf.merge_dims()
.- Parameters:
dims –
dtype –
device –
- Returns:
tensor with shape [dim_0, …, dim_n] -> sparse_dim = merged_dim, where merged_dim = dim_0 * … * dim_n
- returnn.frontend.dims.replace_dim(source: Tensor, *, in_dim: Dim, out_dim: Dim | None = None) Tuple[Tensor, Dim] [source]#
Also see:
rf.merge_dims()
,rf.split_dims()
.- Parameters:
source –
in_dim –
out_dim –
- Returns:
source with in_dim replaced by out_dim, and new out_dim. this does not work for the sparse_dim. see
set_sparse_dim()
for that case.
- returnn.frontend.dims.dim_match_priority_when_needed(dim: Dim, *other_dims: Dim) Dim [source]#
- Returns:
maybe copy of dim with higher match_priority if needed to distinguish from other_dims
Why or when is this needed?
For activation values, this should never be needed, and all dims should be unique.
In case of self-attention, the standard way is to create a separate distinct dim to perform the attention reduction over. See
SelfAttention
.However, in case of weight matrices, it is not unusual to have the same dim for both the input and output, so a square weight matrix. When reduction is performed in
matmul()
, we want to match the input feature dim to the dim in the weight matrix with higher match priority.So
dim_match_priority_when_needed()
would be applied on the input feature dim.https://github.com/rwth-i6/returnn/pull/871 https://github.com/rwth-i6/returnn_common/issues/17#issuecomment-997463222