CNN Architecture & Implementation

Course Outline

Last Updated: 20th November, 2025

Layer-by-Layer Breakdown

A typical CNN architecture consists of these layers:

Input Layer — The raw image (e.g., 28×28×3).
Convolutional Layer — Extracts features using filters.
Activation Layer (ReLU) — Adds non-linearity.
Pooling Layer — Reduces dimensions.
Flatten Layer — Converts 2D data into 1D.
Fully Connected (Dense) Layer — Combines features to make predictions.
Output Layer — Uses Softmax for classification.

The structure can be visualized as:

Input → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Dense → Output

This layered approach helps CNNs understand images from low-level pixels to high-level objects.

Lesson 2: Building a CNN in Python

In this section, we’ll build a Convolutional Neural Network (CNN) step by step using TensorFlow and Keras.

1. We begin by importing the necessary libraries and loading the MNIST dataset of handwritten digits. The images are reshaped and normalized to prepare them for training.

2. We define our CNN model consisting of two convolutional layers followed by a pooling layer, a flattening layer, and two dense layers for classification.

3. The model is then compiled using the Adam optimizer and sparse categorical cross-entropy loss.

4. Finally, we train the model for 5 epochs and evaluate its performance on the test dataset to observe how effectively it classifies handwritten digits.

Let’s build a simple CNN:

Input:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models

# Load data
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)) / 255.0
test_images = test_images.reshape((10000, 28, 28, 1)) / 255.0
# Build model
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile & Train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

Sample Output:

Epoch 1/5
1875/1875 [==============================] - 15s 8ms/step - loss: 0.1483 - accuracy: 0.9557 - val_loss: 0.0491 - val_accuracy: 0.9844
Epoch 2/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0478 - accuracy: 0.9852 - val_loss: 0.0387 - val_accuracy: 0.9870
Epoch 3/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0314 - accuracy: 0.9902 - val_loss: 0.0384 - val_accuracy: 0.9878
Epoch 4/5
1875/1875 [==============================] - 14s 7ms/step - loss: 0.0238 - accuracy: 0.9923 - val_loss: 0.0376 - val_accuracy: 0.9893
Epoch 5/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0184 - accuracy: 0.9940 - val_loss: 0.0410 - val_accuracy: 0.9886

From the output, we can see that the model achieves a validation accuracy of around 98–99%, indicating that it has learned to recognize handwritten digits very effectively. The steadily decreasing loss and high accuracy across epochs show that the CNN is successfully extracting spatial features through convolution and pooling layers, and using them to make accurate classifications through the dense layers. This demonstrates the strong performance and efficiency of CNNs in image recognition tasks like MNIST digit classification.

After training, we can use the CNN to predict labels for new images.

# Make predictions on the first 5 test images
predictions = model.predict(test_images[:5])

# Print predicted and actual labels
for i, prediction in enumerate(predictions):
    predicted_label = prediction.argmax()  # Class with highest probability
    actual_label = test_labels[i]
    print(f"Image {i+1}: Predicted = {predicted_label}, Actual = {actual_label}")

Output:

Image 1: Predicted = 7, Actual = 7
Image 2: Predicted = 2, Actual = 2
Image 3: Predicted = 1, Actual = 1
Image 4: Predicted = 0, Actual = 0
Image 5: Predicted = 4, Actual = 4

Module 3: Building and Applying CNNs