returnn.extern.official_tf_resnet.resnet_model
#
Contains definitions for Residual Networks.
Residual networks (‘v1’ ResNets) were originally proposed in: [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
The full preactivation ‘v2’ ResNet variant was introduced by: [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
The key difference of the full preactivation ‘v2’ variant compared to the ‘v1’ variant in [1] is the use of batch normalization before every weight layer rather than after.
- returnn.extern.official_tf_resnet.resnet_model.batch_norm(inputs, training, data_format)[source]#
Performs a batch normalization using a standard set of parameters.
- returnn.extern.official_tf_resnet.resnet_model.fixed_padding(inputs, kernel_size, data_format, conv_time_dim)[source]#
Pads the input along the spatial dimensions independently of input size.
- Args:
- inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
- kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
Should be a positive integer.
data_format: The input format (‘channels_last’ or ‘channels_first’).
- Returns:
A tensor with the same format as the input with the data either intact (if kernel_size == 1) or padded (if kernel_size > 1).
- returnn.extern.official_tf_resnet.resnet_model.fixed_crop(inputs, crop_size, data_format)[source]#
crops the input along the first spatial dimension.
- Args:
- inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
- crop_size: The number of cropped elements from one side.
Should be a positive integer.
data_format: The input format (‘channels_last’ or ‘channels_first’).
- Returns:
A tensor with the same format as the input with the cropped data.
- returnn.extern.official_tf_resnet.resnet_model.conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format, conv_time_dim)[source]#
Strided 2-D convolution with explicit padding.
- returnn.extern.official_tf_resnet.resnet_model.block_layer(inputs, filters, bottleneck, block_fn, blocks, strides, kernel_size, training, name, data_format, conv_time_dim)[source]#
Creates one layer of blocks for the ResNet model.
- Args:
- inputs: A tensor of size [batch, channels, height_in, width_in] or
[batch, height_in, width_in, channels] depending on data_format.
filters: The number of filters for the first convolution of the layer. bottleneck: Is the block created a bottleneck block. block_fn: The block to use within the model, either building_block or
bottleneck_block.
kernel_size: kernel size for convolutions blocks: The number of blocks contained in the layer. strides: The stride to use for the first convolution of the layer. If
greater than 1, this layer will ultimately downsample the input.
- training: Either True or False, whether we are currently training the
model. Needed for batch norm.
name: A string name for the tensor output of the block layer. data_format: The input format (‘channels_last’ or ‘channels_first’). conv_time_dim: Whether the conv2D operates in time_dim or window_dim.
- Returns:
The output tensor of the block layer.
- class returnn.extern.official_tf_resnet.resnet_model.Model(resnet_size, num_classes, num_filters, conv_time_dim, first_kernel_size, kernel_size, conv_stride, first_pool_size, first_pool_stride, block_sizes, block_strides, final_size, bottleneck=False, resnet_version=2, data_format=None, dtype=tf.float32)[source]#
Base class for building the Resnet Model.
Creates a model for classifying an image.
- Args:
resnet_size: A single integer for the size of the ResNet model. num_classes: The number of classes used as labels. num_filters: The number of filters to use for the first block layer
of the model. This number is then doubled for each subsequent block layer.
conv_time_dim: Whether the conv2D operates in time_dim or window_dim. first_kernel_size: The kernel size to use for convolution. kernel_size: The kernel size to use for convolution. conv_stride: stride size for the initial convolutional layer first_pool_size: Pool size to be used for the first pooling layer.
If none, the first pooling layer is skipped.
- first_pool_stride: stride size for the first pooling layer. Not used
if first_pool_size is None.
- block_sizes: A list containing n values, where n is the number of sets of
block layers desired. Each value should be the number of blocks in the i-th set.
- block_strides: List of integers representing the desired stride size for
each of the sets of block layers. Should be same length as block_sizes.
final_size: The expected size of the model after the second pooling. bottleneck: Use regular blocks or bottleneck blocks. resnet_version: Integer representing which version of the ResNet network
to use. See README for details. Valid values: [1, 2]
- data_format: Input format (‘channels_last’, ‘channels_first’, or None).
If set to None, the format is dependent on whether a GPU is available.
- dtype: The TensorFlow dtype to use for calculations. If not specified
tf.float32 is used.
- Raises:
ValueError: if invalid version is selected.