Overview Of Convolutional Neural Network In Image Classification (2024)

Published onJanuary 25, 2018
In Mystery Vault

Kishan Maladkar

In machine learning, Convolutional Neural Networks (CNN or ConvNet) are complex feed forward neural networks. CNNs are used for image classification and recognition because of its high accuracy. It was proposed by computer scientist Yann LeCun in the late 90s, when he was inspired from the human visual perception of recognizing things. The CNN follows a hierarchical model which works on building a network, like a funnel, and finally gives out a fully-connected layer where all the neurons are connected to each other and the output is processed.

We will construct a new ConvNet step-by-step in this article to explain it further. In this example, we will be implementing the (Modified National Institute of Standards and Technology) MNIST data set for image classification. This data set contains ten digits from 0 to 9. It has 55,000 images — the test set has 10,000 images and the validation set has 5,000 images. We will be building a ConvNet made of two hidden layers or convolutional layers. Now let us look at one of the images and the dimensions of the images. Here is an image, which is the number 7.

This image’s dimensions are 28×28 which are represented in a form of a matrix (28, 28). And the depth or the number of channels this image has is 1, since it is a grayscale image. If it was a colour (for example, RGB) image, the number of channels would have be three.

Now, the first step is to define all the functions which we are going to use in building the model. TensorFlow is a beautiful computational graph which helps build these functions and the variables by just giving them the shape or size and not storing data in them. It’s like drawing a blueprint of a bridge before you start placing the bricks.

Once we have an input image (28×28), a filter is run along all the pixels (rows, columns) of the image which captures the data like in the picture below. This is passed on to the pooling layer where it performs a mathematical computation and gives out a specific result.

Here, a filter is a weight matrix of shape nxn (3×3) in the figure above. The weight is initialised as random numbers with some standard deviation to form normally distributed values. This filter is run across all the values in the matrix and a dot product of the weights and the pixels is calculated.

Let us define a function for our weights and biases. Tensorflow provides functions like the ‘variable’ which helps to store it as an object and you can go through different predefined commands in this page.

Now let us build a function which gives out a convolutional layer. Here, we need to consider some parameters before we build it. First would be the weight matrix or the size of the filter which is going to slide over all the values of the matrix. Let us consider the filter size to be 5×5 and a stride of 2. Stride is the number of pixels you jump or slide over in every iteration. Then we have the number of channels, which is the depth of the image. Since the images are in grayscale the depth is 1. After we have this we pass this layer into the Rectified Linear Unit (ReLU) activation function.

In Python with the TensorFlow library the build is as follows, but we need to initialize the shape and length of our variables here — which are the weights and the biases.