Understanding and Coding a ResNet in Keras (2024)

Doing cool things with data!

ResNet, short for Residual Networks is a classic neural network used as a backbone for many computer vision tasks. This model was the winner of ImageNet challenge in 2015. The fundamental breakthrough with ResNet was it allowed us to train extremely deep neural networks with 150+layers successfully. Prior to ResNet training very deep neural networks was difficult due to the problem of vanishing gradients.

AlexNet, the winner of ImageNet 2012 and the model that apparently kick started the focus on deep learning had only 8 convolutional layers, the VGG network had 19 and Inception or GoogleNet had 22 layers and ResNet 152 had 152 layers. In this blog we will code a ResNet-50 that is a smaller version of ResNet 152 and frequently used as a starting point for transfer learning.

Understanding and Coding a ResNet in Keras (3)

However, increasing network depth does not work by simply stacking layers together. Deep networks are hard to train because of the notorious vanishing gradient problem — as the gradient is back-propagated to earlier layers, repeated multiplication may make the gradient extremely small. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly.

I learnt about coding ResNets from DeepLearning.AI course by Andrew Ng. I highly recommend this course.

On my Github repo, I have shared two notebooks one that codes ResNet from scratch as explained in DeepLearning.AI and the other that uses the pretrained model in Keras. I hope you pull the code and try it for yourself.

Skip Connection — The Strength of ResNet

ResNet first introduced the concept of skip connection. The diagram below illustrates skip connection. The figure on the left is stacking convolution layers together one after the other. On the right we still stack convolution layers as before but we now also add the original input to the output of the convolution block. This is called skip connection

Understanding and Coding a ResNet in Keras (4)

It can be written as two lines of code :

X_shortcut = X # Store the initial value of X in a variable
## Perform convolution + batch norm operations on XX = Add()([X, X_shortcut]) # SKIP Connection

The coding is quite simple but there is one important consideration — since X, X_shortcut above are two matrixes, you can add them only if they have the same shape. So if the convolution + batch norm operations are done in a way that the output shape is the same,then we can simply add them as shown below.

Understanding and Coding a ResNet in Keras (5)

Otherwise, the x_shortcut goes through a convolution layer chosen such that the output from it is the same dimension as the output from the convolution block as shown below:

Understanding and Coding a ResNet in Keras (6)

In the notebook on Github, the two functions identity_block and convolution_block implement above. These functions use Keras to implement Convolution and Batch Norm layers with ReLU activation. Skip connection is technically the one line X = Add()([X, X_shortcut]).

One important thing to note here is that the skip connection is applied before the RELU activation as shown in the diagram above. Research has found that this has the best results.

Why do Skip Connections work?

This is an interesting question. I think there are two reasons why Skip connections work here:

They mitigate the problem of vanishing gradient by allowing this alternate shortcut path for gradient to flow through
They allow the model to learn an identity function which ensures that the higher layer will perform at least as good as the lower layer, and not worse

Infact since ResNet skip connections are used in a lot more model architectures like the Fully Convolutional Network (FCN) and U-Net. They are used to flow information from earlier layers in the model to later layers. In these architectures they are used to pass information from the downsampling layers to the upsampling layers.

Testing the ResNet model we built

The identity and convolution blocks coded in the notebook are then combined to create a ResNet-50 model with the architecture shown below:

Understanding and Coding a ResNet in Keras (7)

The ResNet-50 model consists of 5 stages each with a convolution and Identity block. Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers. The ResNet-50 has over 23 million trainable parameters.

I have tested this model on the signs data set which is also included in my Github repo. This data set has hand images corresponding to 6 classes. We have 1080 train images and 120 test images.

Understanding and Coding a ResNet in Keras (8)

Our ResNet-50 gets to 86% test accuracy in 25 epochs of training. Not bad!

Building ResNet in Keras using pretrained library

I loved coding the ResNet model myself since it allowed me a better understanding of a network that I frequently use in many transfer learning tasks related to image classification, object localization, segmentation etc.

However for more regular use it is faster to use the pretrained ResNet-50 in Keras. Keras has many of these backbone models with their Imagenet weights available in its library.

Understanding and Coding a ResNet in Keras (9)

I have uploaded a notebook on my Github that uses Keras to load the pretrained ResNet-50. You can load the model with 1 line code:

base_model = applications.resnet50.ResNet50(weights= None, include_top=False, input_shape= (img_height,img_width,3))

Here weights=None since I want to initialize the model with random weights as I did on the ResNet-50 I coded. Otherwise I can also load the pretrained ImageNet weights. I set include_top=False to not include the final pooling and fully connected layer in the original model. I added Global Average Pooling and a dense output layaer to the ResNet-50 model.

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.7)(x)
predictions = Dense(num_classes, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)

As shown above Keras provides a very convenient interface to load the pretrained models but it is important to code the ResNet yourself as well at least once so you understand the concept and can maybe apply this learning to another new architecture you are creating.

The Keras ResNet got to an accuracy of 75% after training on 100 epochs with Adam optimizer and a learning rate of 0.0001. The accuracy is a bit lower than our own coded model and I guess this has to do with weight initializations.

Keras also provides an easy interface for data augmentation so if you get a chance, try augmenting this data set and see if that results in better performance.

Conclusion

ResNet is a powerful backbone model that is used very frequently in many computer vision tasks
ResNet uses skip connection to add the output from an earlier layer to a later layer. This helps it mitigate the vanishing gradient problem
You can use Keras to load their pretrained ResNet 50 or use the code I have shared to code ResNet yourself.

I have my own deep learning consultancy and love to work on interesting problems. I have helped many startups deploy innovative AI based solutions. Check us out at — http://deeplearninganalytics.org/.

You can also see my other writings at: https://medium.com/@priya.dwivedi

If you have a project that we can collaborate on, then please contact me through my website or at info@deeplearninganalytics.org