What is ResNet?

Table of Contents

ResNet Block
#

3x3 Convolution (stride=2) - Downsamples the image into half the height and width. Also, doubles the number of channels (number of filters are doubled).
- Sliding a 3x3 kernel/matrix onto the image and dot product is calculated with overlapped images. 3x3 is small enough to capture fine details and more efficient. Can be stacked to capture complex patterns.
Batch Norm
ReLU

Thought Process
#

Our goal is to increase the resolution of the image.

If the model is trained on a vanilla deep net with both low and high resolution input images, it will increase our training loss since the input signal is being lost when its passed through a lot of layers.

Our goal is to now learn the residual to be added to the image to increase its resolution.

The idea is to add the input image to our residual at the end of our block. Two problems arise:

Dimension mismatch (When the input and residual have different dimensions, we cannot add)
A steady stream of input data across the network

So we add residual connections to handle these. And we treat the network as a series of residual blocks instead of layers.

Each block does not take penalty from the loss function since it can output the same identity function.

This enables very deep networks.

Code
#

def forward(self, x: Tensor) -> Tensor: 
    identity = x 
    out = self.conv1(x) 
    out = self.bn1(out) 
    out = self.relu(out) 
    out = self.conv2(out) 
    out = self.bn2(out) 
    
    if self.downsample is not None: 
        identity = self.downsample(x) 
    
    out += identity 
    out = self.relu(out) 
    
    return out

Resnet18, 34, 50, 101, 152. These are pretrained models and the number indicates the number of layers in the architecture.

Architecture
#

ResNet models introduce the concept of residual learning, where the network learns residual functions with reference to the layer inputs, rather than learning unreferenced functions. This allows the model to train very deep networks effectively.

Use Cases
#

ResNet models are commonly used in image classification tasks and are known for their performance on large-scale datasets such as ImageNet.

Strengths and Weaknesses
#

Strengths:
- Effective in training very deep networks.
- Reduces the problem of vanishing gradients.
Weaknesses:
- Computationally expensive for very deep versions like ResNet-152.
- May not be as efficient for smaller datasets or less complex tasks.

Papers
#

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778*.

Author

Siddanth Emani

Data Scientist with 4+ years experience

ResNet Block#

Thought Process#

Code#

Architecture#

Use Cases#

Strengths and Weaknesses#

Papers#