Gradient Boosting Algorithm for Machine Learning

Module - 5 Classification
Gradient Boosting Algorithm for Machine Learning

Gradient Boosting is an ensemble learning technique that combines multiple weak learners to form a strong learner. It is a powerful technique for both classification and regression tasks. Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost. Each algorithm uses different techniques to optimize the model performance such as regularization, tree pruning, feature importance, and so on.

What is Gradient Boosting

Gradient Boosting is a prominent technique for boosting. In gradient boosting, each prediction corrects the inaccuracy of its previous. Unlike Adaboost, the weights of the training instances are not changed; instead, each predictor is trained using the predecessor's residual mistakes as labels.

Gradient Boosted Trees is a method whose basic learner is CART (Classification and Regression Trees).

The graphic below illustrates how gradient boosted trees are trained for regression situations.image (43).png

The ensemble is made up of N trees. The feature matrix X and the labels y are used to train Tree1. The y1(hat) predictions are utilised to calculate the training set residual errors r1. Tree2 is then trained with Tree1's feature matrix X and residual errors r1 as labels. The projected r1(hat) values are then utilised to calculate the residual r2. The technique is continued until all N trees in the ensemble have been trained.

This approach employs an essential parameter called as shrinkage.

Shrinkage refers to the fact that after multiplying the prediction of each tree in the ensemble by the learning rate (eta), which varies from 0 to 1, the forecast of each tree in the ensemble is shrunk. There is a trade-off between eta and the number of estimators; a decrease in learning rate must be compensated by an increase in estimators in order to achieve a specific level of model performance. Predictions may now be made because all trees have been taught. Each tree predicts a label, and the formula provides the final forecast.

y(pred) = y1 + (eta *  r1) + (eta * r2) + ....... + (eta * rN)

GradientBoostingRegressor is the Scikit-Learn class for gradient boosting regression. GradientBoostingClassifier is a classification algorithm that uses a similar approach.

image (44).png

How does gradient descent works?

The basic idea behind gradient descent is to iteratively adjust the model parameters in the direction of steepest descent of the cost function until the minimum is reached.

Here is a step-by-step explanation of how gradient descent works:

  1. Initialization: The first step is to initialize the model parameters with some random values. This could be a vector of zeros, or some other small random values.
  2. Calculate the error: The next step is to evaluate the error or cost function of the model on the training data. This gives us a measure of how well the model is performing.
  3. Calculate the gradient: The gradient of the cost function is the vector of partial derivatives with respect to each of the model parameters. This tells us the direction of steepest ascent of the cost function.
  4. Update the parameters: We update the model parameters by subtracting a small multiple of the gradient from the current parameter values. This brings us closer to the minimum of the cost function.
  5. Repeat: We repeat steps 2-4 until the cost function reaches a minimum or a predefined number of iterations is reached.

The size of the step taken at each iteration is called the learning rate. A high learning rate can cause the algorithm to overshoot the minimum and bounce back and forth, while a low learning rate can cause the algorithm to converge slowly.

There are variations of gradient descent, such as stochastic gradient descent and mini-batch gradient descent, which use random subsets of the training data to compute the gradient at each iteration. These methods can be more efficient for large datasets.

Python implementation

Lets use boston dataset for the demo

Use the already available dataset boston which is in sklearn

import the dataset as “from sklearn.datasets import load_boston”

The Boston housing dataset is included in the Scikit-Learn library. It can be accessed by importing the dataset from the sklearn.datasets module. The dataset contains 506 samples and 13 features. It can be used for both regression and classification tasks. It is a great dataset for practicing machine learning techniques, such as gradient boosting.

#importing the libraries
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_boston

#loading the dataset
boston = load_boston()
X =
y =

#splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#creating the regressor
regressor = GradientBoostingRegressor()

#fitting the regressor, y_train)

#predicting the values
y_pred = regressor.predict(X_test)

#calculating the metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

#printing the metrics
print('Mean Squared Error:', mse)
print('R2 Score:', r2)

This code uses the Gradient Boosting Regressor model from the scikit-learn library to predict the median house prices in the Boston Housing dataset. First, it imports the necessary libraries for the code. Then, it loads the Boston Housing dataset from the scikit-learn library. Next, it splits the data into train and test sets. After that, it creates the Gradient Boosting Regressor model, fits it to the training data, and uses it to make predictions on the test data. Finally, it calculates the mean squared error and R2 score on the test data and prints the results.


Gradient Boosting is a powerful and popular ensemble learning technique for both classification and regression tasks. It combines multiple weak learners into a single strong learner by sequentially optimizing the model performance. Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost. Hyperparameter tuning and loss functions are important considerations when training gradient boosting models. Feature selection, model interpretation, and model ensembling techniques can also be used to improve the model performance. Gradient Boosting is a powerful technique and can be used to achieve excellent results on a variety of tasks.

Key takeaways

  1. Gradient Boosting is an ensemble learning technique used for both classification and regression tasks.
  2. It combines multiple weak learners to form a strong learner.
  3. Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost.
  4. Hyperparameter tuning is an important step in optimizing the model performance.
  5. Loss functions are the measure of how well the model is performing.
  6. Feature selection is an important step in training gradient boosting models.
  7. Model interpretation is the process of understanding the inner workings of a model.
  8. Imbalanced data is a common problem in machine learning and can be handled using oversampling, undersampling, and synthetic data generation.
  9. Model ensembling is the process of combining multiple models to create a more powerful model.
  10. Automated machine learning (AutoML) is an emerging field that uses algorithms to automate the process of model development.
  11. Gradient boosting libraries provide an easy way to train and deploy gradient boosting models.
  12. Performance evaluation is an important step in the machine learning process and involves measuring the model performance and comparing it to a baseline.


1.Which of the following is an important step in training gradient boosting models? 

  1. Hyperparameter Tuning 
  2. Feature Selection 
  3. Model Interpretation 
  4.  Model Ensembling

Answer: B. Feature Selection

2.What metric is commonly used for evaluating the performance of a gradient boosting model? 

  1.  F1 Score 
  2. Log Loss 
  3. Mean Absolute Error 
  4. Mean Squared Error

Answer: D. Mean Squared Error

3.Which of the following is a commonly used technique for handling imbalanced data with gradient boosting? 

  1. Oversampling 
  2. Undersampling 
  3. Grid Search 
  4. Random Search

Answer: A. Oversampling

4.What is the name of the popular library for training and deploying gradient boosting models? 

  1. XGBoost 
  2.  LightGBM 
  3. CatBoost 
  4. All of the above

Answer: D. All of the Above

Recommended Courses
Certification in Full Stack Data Science and AI
20,000 people are doing this course
Become a job-ready Data Science professional in 30 weeks. Join the largest tech community in India. Pay only after you get a job above 5 LPA.
Masters in CS: Data Science and Artificial Intelligence
20,000 people are doing this course
Join India's only Pay after placement Master's degree in Data Science. Get an assured job of 5 LPA and above. Accredited by ECTS and globally recognised in EU, US, Canada and 60+ countries.

AlmaBetter’s curriculum is the best curriculum available online. AlmaBetter’s program is engaging, comprehensive, and student-centered. If you are honestly interested in Data Science, you cannot ask for a better platform than AlmaBetter.

Kamya Malhotra
Statistical Analyst
Fast forward your career in tech with AlmaBetter

Vikash SrivastavaCo-founder & CPTO AlmaBetter

Vikas CTO

Related Tutorials to watch

Top Articles toRead

Made with heartin Bengaluru, India
  • Official Address
  • 4th floor, 133/2, Janardhan Towers, Residency Road, Bengaluru, Karnataka, 560025
  • Communication Address
  • 4th floor, 315 Work Avenue, Siddhivinayak Tower, 152, 1st Cross Rd., 1st Block, Koramangala, Bengaluru, Karnataka, 560034
  • Follow Us
  • facebookinstagramlinkedintwitteryoutubetelegram

© 2023 AlmaBetter