returnn.frontend.conv
¶
Convolution, transposed convolution, pooling
- returnn.frontend.conv.conv(source: Tensor, *, in_dim: Dim, out_dim: Dim, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None, filter: Tensor, filter_size: Sequence[Dim], padding: str | int | Sequence[int], strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, bias: Tensor | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
Generic N-D convolution.
- Parameters:
source
in_dim – input channels
out_dim – output channels
in_spatial_dims – On what dimensions to operate on. The number of specified dims (1, 2 or 3) specifies whether this is 1D, 2D or 3D convolution. The order is consistent with the order of the
filter_size
,strides
, etc.out_spatial_dims
filter
filter_size – defines the order of dims in
filter
such that it matches the order ofin_spatial_dims
.padding – “valid” or “same” or int. “valid” is like padding=0. padding=”same” will pad such that the output has the same spatial dimensions as the input (in case of stride=1), or otherwise ceildiv(input, stride). The specific padding in padding=”same” with stride>1 has changed with behavior version >=24 (or global config option
rf_use_consistent_same_padding
) and is now consistent independent of dimension size. See_consistent_same_padding()
for more details.strides – the default (if it is None) is 1
dilation_rate
groups
bias
use_mask – Whether to mask the input tensor based on seq lengths such that the padding in the padded tensor is ignored (it will mask with 0). With behavior version >=23, this is enabled by default, or configured with global config option
rf_use_mask
. (Also seeuse_mask_default()
).
- Returns:
out, out_spatial_dims
- class returnn.frontend.conv.Conv1d(in_dim: Dim, out_dim: Dim, filter_size: int | Dim, *, padding: str, strides: int | None = None, dilation_rate: int | None = None, groups: int | None = None, with_bias: bool = True)[source]¶
1D convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size (int|Dim)
padding (str) – “same” or “valid”
strides (int|None) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
dilation_rate (int|None) – dilation for the spatial dims
groups (int) – grouped convolution
with_bias (bool) – if True, will add a bias to the output features
- class returnn.frontend.conv.Conv2d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim] | int | Dim, *, padding: str, strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, with_bias: bool = True)[source]¶
2D convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.
padding (str) – “same” or “valid”
strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
dilation_rate (int|Sequence[int]) – dilation for the spatial dims
groups (int) – grouped convolution
with_bias (bool) – if True, will add a bias to the output features
- class returnn.frontend.conv.Conv3d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim] | int | Dim, *, padding: str, strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, with_bias: bool = True)[source]¶
3D convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.
padding (str) – “same” or “valid”
strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
dilation_rate (int|Sequence[int]) – dilation for the spatial dims
groups (int) – grouped convolution
with_bias (bool) – if True, will add a bias to the output features
- returnn.frontend.conv.transposed_conv(source: Tensor, *, in_dim: Dim, out_dim: Dim, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None, filter: Tensor, filter_size: Sequence[Dim], padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, bias: Tensor | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
transposed conv
- class returnn.frontend.conv.TransposedConv1d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]¶
1D transposed convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size (list[int])
strides (list[int]|None) – specifies the upscaling. by default, same as filter_size
padding (str) – “same” or “valid”
remove_padding (list[int]|int)
output_padding (list[int|None]|int|None)
with_bias (bool) – whether to add a bias. enabled by default
- class returnn.frontend.conv.TransposedConv2d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]¶
2D transposed convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size (list[int])
strides (list[int]|None) – specifies the upscaling. by default, same as filter_size
padding (str) – “same” or “valid”
remove_padding (list[int]|int)
output_padding (list[int|None]|int|None)
with_bias (bool) – whether to add a bias. enabled by default
- class returnn.frontend.conv.TransposedConv3d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]¶
3D transposed convolution
- Parameters:
in_dim (Dim)
out_dim (Dim)
filter_size (list[int])
strides (list[int]|None) – specifies the upscaling. by default, same as filter_size
padding (str) – “same” or “valid”
remove_padding (list[int]|int)
output_padding (list[int|None]|int|None)
with_bias (bool) – whether to add a bias. enabled by default
- returnn.frontend.conv.pool(source: Tensor, *, nd: int | None = None, mode: str, pool_size: Sequence[int] | int, padding: str | int | Sequence[int] = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim] | Dim, out_spatial_dims: Dim | Sequence[Dim] | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
Generic N-D pooling.
- Parameters:
source
nd
mode – “max” or “avg”
pool_size – shape of the window of each reduce
padding – “valid” or “same” or int. “valid” is like padding=0. padding=”same” will pad such that the output has the same spatial dimensions as the input (in case of stride=1), or otherwise ceildiv(input, stride). The specific padding in padding=”same” with stride>1 has changed with behavior version >=24 (or global config option
rf_use_consistent_same_padding
) and is now consistent independent of dimension size. See_consistent_same_padding()
for more details.dilation_rate
strides – the default (if it is None) will be set to pool_size (in contrast to
conv()
)in_spatial_dims
out_spatial_dims
use_mask – Whether to mask the input tensor based on seq lengths such that the padding in the padded tensor is ignored (for max-pooling, it will mask with -inf, for avg-pooling with 0). With behavior version >=23, this is enabled by default, or configured with global config option
rf_use_mask
. (Also seeuse_mask_default()
).
- Returns:
out, out_spatial_dims
- returnn.frontend.conv.max_pool(source: Tensor, *, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim] | Dim, out_spatial_dims: Dim | Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
max-pool
- returnn.frontend.conv.max_pool1d(source: Tensor, *, pool_size: int, padding: str = 'valid', dilation_rate: int = 1, strides: int | None = None, in_spatial_dim: Dim, out_spatial_dim: Dim | None = None) Tuple[Tensor, Dim] [source]¶
max pool
- returnn.frontend.conv.pool1d(source: Tensor, *, mode: str, pool_size: int, padding: str = 'valid', dilation_rate: int = 1, strides: int | None = None, in_spatial_dim: Dim, out_spatial_dim: Dim | None = None) Tuple[Tensor, Dim] [source]¶
1D pooling.
- Parameters:
source (Tensor)
mode (str) – “max” or “avg”
pool_size (tuple[int]) – shape of the window of each reduce
padding (str) – “valid” or “same”
dilation_rate (tuple[int]|int)
strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
in_spatial_dim (Sequence[Dim])
out_spatial_dim (Sequence[Dim]|None)
- Returns:
layer, out_spatial_dim
- returnn.frontend.conv.pool2d(source: Tensor, *, mode: str, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
2D pooling.
- Parameters:
source (Tensor)
mode (str) – “max” or “avg”
pool_size (tuple[int]) – shape of the window of each reduce
padding (str) – “valid” or “same”
dilation_rate (tuple[int]|int)
strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
in_spatial_dims (Sequence[Dim])
out_spatial_dims (Sequence[Dim]|None)
- Returns:
layer, out_spatial_dims
- returnn.frontend.conv.pool3d(source: Tensor, *, mode: str, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]] [source]¶
3D pooling.
- Parameters:
source (Tensor)
mode (str) – “max” or “avg”
pool_size (tuple[int]) – shape of the window of each reduce
padding (str) – “valid” or “same”
dilation_rate (tuple[int]|int)
strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
in_spatial_dims (Sequence[Dim])
out_spatial_dims (Sequence[Dim]|None)
- Returns:
layer, out_spatial_dims
- returnn.frontend.conv.make_conv_out_spatial_dims(in_spatial_dims: Sequence[Dim], *, filter_size: Sequence[int | Dim] | int | Dim, padding: str | int | Sequence[int], strides: Sequence[int] | int = 1, dilation_rate: Sequence[int] | int = 1, description_prefix: str | None = None) Sequence[Dim] [source]¶
create out spatial dims from in spatial dims