returnn.torch.util.array_

Array (Tensor) functions

returnn.torch.util.array_.masked_select(input: Tensor, mask: Tensor, *, mask_len: int | Tensor | None = None)[source]

Like torch.masked_select() but much more efficient, both in terms of memory and computation time, both on CPU and GPU.

See here for the issues with torch.masked_select(): https://github.com/rwth-i6/returnn/issues/1584 https://github.com/pytorch/pytorch/issues/30246 https://github.com/pytorch/pytorch/issues/56896

Parameters:
  • input – [mask_dims…, remaining_dims…]

  • mask – [mask_dims…], binary mask to index with. if it has less dims than input, the remaining dims are broadcasted.

  • mask_len – if given, the length of the mask. this avoids a CUDA synchronization.

Returns:

selected elements, shape [mask_len, remaining_dims…]

returnn.torch.util.array_.nonzero(mask: Tensor, *, out_len: int | Tensor) Tensor[source]

This has the advantage over torch.nonzero() that we do not need to perform a CUDA synchronization. We can avoid that when we know the output length in advance.

However, in my benchmarks, it seems this is slower than torch.nonzero. https://github.com/rwth-i6/returnn/pull/1593 https://github.com/pytorch/pytorch/issues/131256

Parameters:
  • mask – flattened (dim() == 1) mask, bool

  • out_len

Returns:

indices of True elements, shape [out_len]. like mask.nonzero().flatten()