returnn.extern.official_tf_resnet.resnet_model#

Contains definitions for Residual Networks.

Residual networks (‘v1’ ResNets) were originally proposed in: [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep Residual Learning for Image Recognition. arXiv:1512.03385

The full preactivation ‘v2’ ResNet variant was introduced by: [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Identity Mappings in Deep Residual Networks. arXiv: 1603.05027

The key difference of the full preactivation ‘v2’ variant compared to the ‘v1’ variant in [1] is the use of batch normalization before every weight layer rather than after.

returnn.extern.official_tf_resnet.resnet_model.batch_norm(inputs, training, data_format)[source]#

Performs a batch normalization using a standard set of parameters.

returnn.extern.official_tf_resnet.resnet_model.fixed_padding(inputs, kernel_size, data_format, conv_time_dim)[source]#

Pads the input along the spatial dimensions independently of input size.

Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or

[batch, height_in, width_in, channels] depending on data_format.

kernel_size: The kernel to be used in the conv2d or max_pool2d operation.

Should be a positive integer.

data_format: The input format (‘channels_last’ or ‘channels_first’).

Returns:

A tensor with the same format as the input with the data either intact (if kernel_size == 1) or padded (if kernel_size > 1).

returnn.extern.official_tf_resnet.resnet_model.fixed_crop(inputs, crop_size, data_format)[source]#

crops the input along the first spatial dimension.

Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or

[batch, height_in, width_in, channels] depending on data_format.

crop_size: The number of cropped elements from one side.

Should be a positive integer.

data_format: The input format (‘channels_last’ or ‘channels_first’).

Returns:

A tensor with the same format as the input with the cropped data.

returnn.extern.official_tf_resnet.resnet_model.conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format, conv_time_dim)[source]#

Strided 2-D convolution with explicit padding.

returnn.extern.official_tf_resnet.resnet_model.block_layer(inputs, filters, bottleneck, block_fn, blocks, strides, kernel_size, training, name, data_format, conv_time_dim)[source]#

Creates one layer of blocks for the ResNet model.

Args:
inputs: A tensor of size [batch, channels, height_in, width_in] or

[batch, height_in, width_in, channels] depending on data_format.

filters: The number of filters for the first convolution of the layer. bottleneck: Is the block created a bottleneck block. block_fn: The block to use within the model, either building_block or

bottleneck_block.

kernel_size: kernel size for convolutions blocks: The number of blocks contained in the layer. strides: The stride to use for the first convolution of the layer. If

greater than 1, this layer will ultimately downsample the input.

training: Either True or False, whether we are currently training the

model. Needed for batch norm.

name: A string name for the tensor output of the block layer. data_format: The input format (‘channels_last’ or ‘channels_first’). conv_time_dim: Whether the conv2D operates in time_dim or window_dim.

Returns:

The output tensor of the block layer.

class returnn.extern.official_tf_resnet.resnet_model.Model(resnet_size, num_classes, num_filters, conv_time_dim, first_kernel_size, kernel_size, conv_stride, first_pool_size, first_pool_stride, block_sizes, block_strides, final_size, bottleneck=False, resnet_version=2, data_format=None, dtype=tf.float32)[source]#

Base class for building the Resnet Model.

Creates a model for classifying an image.

Args:

resnet_size: A single integer for the size of the ResNet model. num_classes: The number of classes used as labels. num_filters: The number of filters to use for the first block layer

of the model. This number is then doubled for each subsequent block layer.

conv_time_dim: Whether the conv2D operates in time_dim or window_dim. first_kernel_size: The kernel size to use for convolution. kernel_size: The kernel size to use for convolution. conv_stride: stride size for the initial convolutional layer first_pool_size: Pool size to be used for the first pooling layer.

If none, the first pooling layer is skipped.

first_pool_stride: stride size for the first pooling layer. Not used

if first_pool_size is None.

block_sizes: A list containing n values, where n is the number of sets of

block layers desired. Each value should be the number of blocks in the i-th set.

block_strides: List of integers representing the desired stride size for

each of the sets of block layers. Should be same length as block_sizes.

final_size: The expected size of the model after the second pooling. bottleneck: Use regular blocks or bottleneck blocks. resnet_version: Integer representing which version of the ResNet network

to use. See README for details. Valid values: [1, 2]

data_format: Input format (‘channels_last’, ‘channels_first’, or None).

If set to None, the format is dependent on whether a GPU is available.

dtype: The TensorFlow dtype to use for calculations. If not specified

tf.float32 is used.

Raises:

ValueError: if invalid version is selected.

calculate_time_dim_reduction()[source]#