A typical CNN architecture consists of these layers:
The structure can be visualized as:
Input → Conv → ReLU → Pool → Conv → ReLU → Pool → Flatten → Dense → Output
This layered approach helps CNNs understand images from low-level pixels to high-level objects.
In this section, we’ll build a Convolutional Neural Network (CNN) step by step using TensorFlow and Keras.
1. We begin by importing the necessary libraries and loading the MNIST dataset of handwritten digits. The images are reshaped and normalized to prepare them for training.
2. We define our CNN model consisting of two convolutional layers followed by a pooling layer, a flattening layer, and two dense layers for classification.
3. The model is then compiled using the Adam optimizer and sparse categorical cross-entropy loss.
4. Finally, we train the model for 5 epochs and evaluate its performance on the test dataset to observe how effectively it classifies handwritten digits.
Let’s build a simple CNN:
Input:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
# Load data
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)) / 255.0
test_images = test_images.reshape((10000, 28, 28, 1)) / 255.0
# Build model
model = models.Sequential([
layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile & Train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Sample Output:
Epoch 1/5
1875/1875 [==============================] - 15s 8ms/step - loss: 0.1483 - accuracy: 0.9557 - val_loss: 0.0491 - val_accuracy: 0.9844
Epoch 2/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0478 - accuracy: 0.9852 - val_loss: 0.0387 - val_accuracy: 0.9870
Epoch 3/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0314 - accuracy: 0.9902 - val_loss: 0.0384 - val_accuracy: 0.9878
Epoch 4/5
1875/1875 [==============================] - 14s 7ms/step - loss: 0.0238 - accuracy: 0.9923 - val_loss: 0.0376 - val_accuracy: 0.9893
Epoch 5/5
1875/1875 [==============================] - 14s 8ms/step - loss: 0.0184 - accuracy: 0.9940 - val_loss: 0.0410 - val_accuracy: 0.9886
From the output, we can see that the model achieves a validation accuracy of around 98–99%, indicating that it has learned to recognize handwritten digits very effectively. The steadily decreasing loss and high accuracy across epochs show that the CNN is successfully extracting spatial features through convolution and pooling layers, and using them to make accurate classifications through the dense layers. This demonstrates the strong performance and efficiency of CNNs in image recognition tasks like MNIST digit classification.
After training, we can use the CNN to predict labels for new images.
# Make predictions on the first 5 test images
predictions = model.predict(test_images[:5])
# Print predicted and actual labels
for i, prediction in enumerate(predictions):
predicted_label = prediction.argmax() # Class with highest probability
actual_label = test_labels[i]
print(f"Image {i+1}: Predicted = {predicted_label}, Actual = {actual_label}")
Output:
Image 1: Predicted = 7, Actual = 7
Image 2: Predicted = 2, Actual = 2
Image 3: Predicted = 1, Actual = 1
Image 4: Predicted = 0, Actual = 0
Image 5: Predicted = 4, Actual = 4
Top Tutorials
Related Articles