returnn.frontend.conv

Convolution, transposed convolution, pooling

returnn.frontend.conv.conv(source: Tensor, *, in_dim: Dim, out_dim: Dim, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None, filter: Tensor, filter_size: Sequence[Dim], padding: str | int | Sequence[int], strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, bias: Tensor | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]][source]

Generic N-D convolution.

Parameters:
  • source

  • in_dim – input channels

  • out_dim – output channels

  • in_spatial_dims – On what dimensions to operate on. The number of specified dims (1, 2 or 3) specifies whether this is 1D, 2D or 3D convolution. The order is consistent with the order of the filter_size, strides, etc.

  • out_spatial_dims

  • filter

  • filter_size – defines the order of dims in filter such that it matches the order of in_spatial_dims.

  • padding – “valid” or “same” or int. “valid” is like padding=0. padding=”same” will pad such that the output has the same spatial dimensions as the input (in case of stride=1), or otherwise ceildiv(input, stride). The specific padding in padding=”same” with stride>1 has changed with behavior version >=24 (or global config option rf_use_consistent_same_padding) and is now consistent independent of dimension size. See _consistent_same_padding() for more details.

  • strides – the default (if it is None) is 1

  • dilation_rate

  • groups

  • bias

  • use_mask – Whether to mask the input tensor based on seq lengths such that the padding in the padded tensor is ignored (it will mask with 0). With behavior version >=23, this is enabled by default, or configured with global config option rf_use_mask. (Also see use_mask_default()).

Returns:

out, out_spatial_dims

class returnn.frontend.conv.Conv1d(in_dim: Dim, out_dim: Dim, filter_size: int | Dim, *, padding: str, strides: int | None = None, dilation_rate: int | None = None, groups: int | None = None, with_bias: bool = True)[source]

1D convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size (int|Dim)

  • padding (str) – “same” or “valid”

  • strides (int|None) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.

  • dilation_rate (int|None) – dilation for the spatial dims

  • groups (int) – grouped convolution

  • with_bias (bool) – if True, will add a bias to the output features

nd: int | None = 1[source]
class returnn.frontend.conv.Conv2d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim] | int | Dim, *, padding: str, strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, with_bias: bool = True)[source]

2D convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.

  • padding (str) – “same” or “valid”

  • strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.

  • dilation_rate (int|Sequence[int]) – dilation for the spatial dims

  • groups (int) – grouped convolution

  • with_bias (bool) – if True, will add a bias to the output features

nd: int | None = 2[source]
class returnn.frontend.conv.Conv3d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim] | int | Dim, *, padding: str, strides: int | Sequence[int] | None = None, dilation_rate: int | Sequence[int] | None = None, groups: int | None = None, with_bias: bool = True)[source]

3D convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.

  • padding (str) – “same” or “valid”

  • strides (int|Sequence[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.

  • dilation_rate (int|Sequence[int]) – dilation for the spatial dims

  • groups (int) – grouped convolution

  • with_bias (bool) – if True, will add a bias to the output features

nd: int | None = 3[source]
returnn.frontend.conv.transposed_conv(source: Tensor, *, in_dim: Dim, out_dim: Dim, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None, filter: Tensor, filter_size: Sequence[Dim], padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, bias: Tensor | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]][source]

transposed conv

class returnn.frontend.conv.TransposedConv1d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]

1D transposed convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size (list[int])

  • strides (list[int]|None) – specifies the upscaling. by default, same as filter_size

  • padding (str) – “same” or “valid”

  • remove_padding (list[int]|int)

  • output_padding (list[int|None]|int|None)

  • with_bias (bool) – whether to add a bias. enabled by default

nd: int | None = 1[source]
class returnn.frontend.conv.TransposedConv2d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]

2D transposed convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size (list[int])

  • strides (list[int]|None) – specifies the upscaling. by default, same as filter_size

  • padding (str) – “same” or “valid”

  • remove_padding (list[int]|int)

  • output_padding (list[int|None]|int|None)

  • with_bias (bool) – whether to add a bias. enabled by default

nd: int | None = 2[source]
class returnn.frontend.conv.TransposedConv3d(in_dim: Dim, out_dim: Dim, filter_size: Sequence[int | Dim], *, padding: str, remove_padding: Sequence[int] | int = 0, output_padding: Sequence[int | None] | int | None = None, strides: Sequence[int] | None = None, with_bias: bool = True)[source]

3D transposed convolution

Parameters:
  • in_dim (Dim)

  • out_dim (Dim)

  • filter_size (list[int])

  • strides (list[int]|None) – specifies the upscaling. by default, same as filter_size

  • padding (str) – “same” or “valid”

  • remove_padding (list[int]|int)

  • output_padding (list[int|None]|int|None)

  • with_bias (bool) – whether to add a bias. enabled by default

nd: int | None = 3[source]
returnn.frontend.conv.pool(source: Tensor, *, nd: int | None = None, mode: str, pool_size: Sequence[int] | int, padding: str | int | Sequence[int] = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim] | Dim, out_spatial_dims: Dim | Sequence[Dim] | None = None, use_mask: bool | None = None) Tuple[Tensor, Sequence[Dim]][source]

Generic N-D pooling.

Parameters:
  • source

  • nd

  • mode – “max” or “avg”

  • pool_size – shape of the window of each reduce

  • padding – “valid” or “same” or int. “valid” is like padding=0. padding=”same” will pad such that the output has the same spatial dimensions as the input (in case of stride=1), or otherwise ceildiv(input, stride). The specific padding in padding=”same” with stride>1 has changed with behavior version >=24 (or global config option rf_use_consistent_same_padding) and is now consistent independent of dimension size. See _consistent_same_padding() for more details.

  • dilation_rate

  • strides – the default (if it is None) will be set to pool_size (in contrast to conv())

  • in_spatial_dims

  • out_spatial_dims

  • use_mask – Whether to mask the input tensor based on seq lengths such that the padding in the padded tensor is ignored (for max-pooling, it will mask with -inf, for avg-pooling with 0). With behavior version >=23, this is enabled by default, or configured with global config option rf_use_mask. (Also see use_mask_default()).

Returns:

out, out_spatial_dims

returnn.frontend.conv.max_pool(source: Tensor, *, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim] | Dim, out_spatial_dims: Dim | Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]][source]

max-pool

returnn.frontend.conv.max_pool1d(source: Tensor, *, pool_size: int, padding: str = 'valid', dilation_rate: int = 1, strides: int | None = None, in_spatial_dim: Dim, out_spatial_dim: Dim | None = None) Tuple[Tensor, Dim][source]

max pool

returnn.frontend.conv.pool1d(source: Tensor, *, mode: str, pool_size: int, padding: str = 'valid', dilation_rate: int = 1, strides: int | None = None, in_spatial_dim: Dim, out_spatial_dim: Dim | None = None) Tuple[Tensor, Dim][source]

1D pooling.

Parameters:
  • source (Tensor)

  • mode (str) – “max” or “avg”

  • pool_size (tuple[int]) – shape of the window of each reduce

  • padding (str) – “valid” or “same”

  • dilation_rate (tuple[int]|int)

  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size

  • in_spatial_dim (Sequence[Dim])

  • out_spatial_dim (Sequence[Dim]|None)

Returns:

layer, out_spatial_dim

returnn.frontend.conv.pool2d(source: Tensor, *, mode: str, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]][source]

2D pooling.

Parameters:
  • source (Tensor)

  • mode (str) – “max” or “avg”

  • pool_size (tuple[int]) – shape of the window of each reduce

  • padding (str) – “valid” or “same”

  • dilation_rate (tuple[int]|int)

  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size

  • in_spatial_dims (Sequence[Dim])

  • out_spatial_dims (Sequence[Dim]|None)

Returns:

layer, out_spatial_dims

returnn.frontend.conv.pool3d(source: Tensor, *, mode: str, pool_size: Sequence[int] | int, padding: str = 'valid', dilation_rate: Sequence[int] | int = 1, strides: int | Sequence[int] | None = None, in_spatial_dims: Sequence[Dim], out_spatial_dims: Sequence[Dim] | None = None) Tuple[Tensor, Sequence[Dim]][source]

3D pooling.

Parameters:
  • source (Tensor)

  • mode (str) – “max” or “avg”

  • pool_size (tuple[int]) – shape of the window of each reduce

  • padding (str) – “valid” or “same”

  • dilation_rate (tuple[int]|int)

  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size

  • in_spatial_dims (Sequence[Dim])

  • out_spatial_dims (Sequence[Dim]|None)

Returns:

layer, out_spatial_dims

returnn.frontend.conv.make_conv_out_spatial_dims(in_spatial_dims: Sequence[Dim], *, filter_size: Sequence[int | Dim] | int | Dim, padding: str | int | Sequence[int], strides: Sequence[int] | int = 1, dilation_rate: Sequence[int] | int = 1, description_prefix: str | None = None) Sequence[Dim][source]

create out spatial dims from in spatial dims