returnn.torch.util.array_¶
Array (Tensor) functions
- returnn.torch.util.array_.masked_select(input: Tensor, mask: Tensor, *, mask_len: int | Tensor | None = None)[source]¶
Like
torch.masked_select()but much more efficient, both in terms of memory and computation time, both on CPU and GPU.See here for the issues with
torch.masked_select(): https://github.com/rwth-i6/returnn/issues/1584 https://github.com/pytorch/pytorch/issues/30246 https://github.com/pytorch/pytorch/issues/56896- Parameters:
input – [mask_dims…, remaining_dims…]
mask – [mask_dims…], binary mask to index with. if it has less dims than
input, the remaining dims are broadcasted.mask_len – if given, the length of the mask. this avoids a CUDA synchronization.
- Returns:
selected elements, shape [mask_len, remaining_dims…]
- returnn.torch.util.array_.nonzero(mask: Tensor, *, out_len: int | Tensor) Tensor[source]¶
This has the advantage over
torch.nonzero()that we do not need to perform a CUDA synchronization. We can avoid that when we know the output length in advance.However, in my benchmarks, it seems this is slower than torch.nonzero. https://github.com/rwth-i6/returnn/pull/1593 https://github.com/pytorch/pytorch/issues/131256
- Parameters:
mask – flattened (dim() == 1) mask, bool
out_len
- Returns:
indices of True elements, shape [out_len]. like
mask.nonzero().flatten()
- returnn.torch.util.array_.sequence_mask(lengths: Tensor, *, maxlen: int | None = None) Tensor[source]¶
Creates a boolean mask from sequence lengths.
- Parameters:
lengths – Tensor of shape [batch_size…] containing sequence lengths
maxlen – Maximum length of the sequences. If None, uses the maximum value in lengths.
- Returns:
A boolean mask tensor of shape [batch_size…, maxlen]
- returnn.torch.util.array_.sequence_mask_time_major(lengths: Tensor, *, maxlen: int | None = None) Tensor[source]¶
Creates a boolean mask from sequence lengths.
- Parameters:
lengths – Tensor of shape [batch_size…] containing sequence lengths
maxlen – Maximum length of the sequences. If None, uses the maximum value in lengths.
- Returns:
A boolean mask tensor of shape [maxlen, batch_size…]