Research Fellow (Data Science) at almaBetter
However, before reading this article we suggest you have a look at our recent article on “An Intuition behind Computer Vision”.
What comes into our head when we think about image classification? Convolutional Neural Networks (CNN), right?
However, have we ever focused on the components?
Now do not give us that look. We know all about convolution layers, max-pooling, normalization, dropout; but do you know when to apply them and why?
Even when individuals follow blindly, they build networks without the foundation behind the neural network. This is what we are going to cover in this blog - the basic building block of networks with intuition.
It is always better to understand the roots, though.
When we talk about images, we talk about information.
What is information?
Well, information tells about the content available in the image.
For example, if you consider the above image, you can see the Hera Pheri character Raju standing in a stylish pose, wearing really expensive sunglasses, and exuding confidence.
Now, let us look at this image from the perspective of neural networks.
Networks don’t see it as information. In the words of deep learning, we call it features.
A system will interpret, a person is standing in the image, posing (pose estimation), looking confident (emotion detection), wearing sunglasses, with objects in the background (object detection).
Can we figure out the difference?
Well, where we humans connect small dots and create a story, computers focus in a more technical way and try to see the components without assuming or interpreting a story.
When computers see an image, they don’t directly jump to a conclusion that an object is there in the image.
They try to see small edges and gradients first, then the textures and patterns. Based on those observations, they try to build parts of objects, and finally, these parts are combined to form an object.
If you consider this whole procedure, you need someone who can first see these edges, gradients, textures, patterns, and parts of the object first hand in a neural network and later save all this so we can process it.
This is where the concept of kernels and channels comes into the picture.
Kernels & channels
A kernel is a feature extractor that extracts features in the image. It will extract the important features like the tail and ear of a dog and remove the noise like background. On the other hand, channels will store this extracted information which will act as an input for the next kernel (or layer).
As we have different features, we keep increasing the channels so that we can keep these features in separate containers - like a separate channel for the ear of a german shepherd (erect ears) and another for the ear of a beagle (flaccid ears).
Please do not get confused about channels and the number of channels. A channel is acting as a container, containing the information. The number of channels define how many channels we have for storing the extracted information by the kernel. For example, the number of channels for an RGB image is 3. However, when we talk about only what are channels - they store the information extracted by the kernel (feature extractor or filter).
Let us take an example and understand this whole concept more clearly.
Let’s say we go to a restaurant, where we see the menu to order something. When we pick the menu we can see two things, the veg and the non-veg sections. Obviously, veg contains only vegetarian dishes and non-veg contains only non-vegetarian dishes.
In the above case, we are behaving exactly like a kernel where we are extracting that information from the menu. The pages that contain the veg and non-veg dishes’ information are two channels keeping similar information.
Let’s say we decided to eat veg, in this case, we are acting as a kernel and going in the veg channel, trying to extract more information about the dishes. We see there are four main things, first the tandoor, second curry, third rice, and fourth chapati, tandoor contains many items like paneer tandoori, etc., curry contains potato curry, mixed veg, rice contains fried rice and biryani, and so on.
If you consider this divided menu, all these things are features which are contained by the veg channel, so you act as a kernel and try to extract similar information again.
You can play this game the whole day.
You can even consider the TV channels where some are dedicated to cartoons, some for news, some for daily drama shows, and more.
Similarly, in a CNN, the channels in initial layers will keep general information like we can have three channels in the first layers for keeping veg, non-veg, and beverages. Now we add more channels and divide the veg channel to say dry sabzi and sabzi with gravy. For non-veg we can have channels for mutton and chicken dishes and for cold and hot beverages and so on.
Now, we can relate how we keep separate information as we move down in a network. Initially, the channels contain general information and they become more specific towards a particular class as we move down in the network.
The Technical Side
Basically, when we consider these cases on images, we see the edge is just not an edge, it can have different shapes.
It can be straight, tilted, horizontal, and so on. Also, you have gradients at the same time, when kernels store all these. They are stored in separate channels.
A channel can be only dedicated to straight edges, another is just containing horizontal, and another one is just containing tilted edges.
At the same time, some channels contain gradients and color shades.
Then, there will be dedicated channels for storing the patterns in the image. For example, some matrix structure on a particular object like the checks on a man’s shirt.
We can consider the image below for a better understanding
In the above image, we can see similar edges are stored in one channel at the same time. Color gradients are a part of separate channels, and in some channels, we can even notice patterns.
With the above conclusions, we take all these channels and send them to the next layer of the neural network and the process goes on.
If you have an avid interest in Deep Learning and all things Data Science, sign up for AlmaBetter’s Full Stack Data Science program to become a coveted Data Science and Analytics professional.
Stay tuned to our blog page for more interesting blogs.
Read our recent blog on “Everything you need to know about deepchecks”.