Course Outline

Logistic Regression in Machine Learning

Decision Tree in Machine Learning

Ensembles of Decision Trees

Random Forest Algorithm in Machine Learning

AdaBoost Algorithm

Gradient Boosting Algorithm for Machine Learning

XGBoost Algorithm in Machine Learning

Metrics for Classification Model

Gradient Boosting Algorithm for Machine Learning

Last Updated: 28th February, 2024

Gradient Boosting is an ensemble learning technique that combines multiple weak learners to form a strong learner. It is a powerful technique for both classification and regression tasks. Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost. Each algorithm uses different techniques to optimize the model performance such as regularization, tree pruning, feature importance, and so on.

What is Gradient Boosting

Gradient Boosting is a prominent technique for boosting. In gradient boosting, each prediction corrects the inaccuracy of its previous. Unlike Adaboost, the weights of the training instances are not changed; instead, each predictor is trained using the predecessor's residual mistakes as labels.

Gradient Boosted Trees is a method whose basic learner is CART (Classification and Regression Trees).

The graphic below illustrates how gradient boosted trees are trained for regression situations. image (43).png

The ensemble is made up of N trees. The feature matrix X and the labels y are used to train Tree1. The y1(hat) predictions are utilised to calculate the training set residual errors r1. Tree2 is then trained with Tree1's feature matrix X and residual errors r1 as labels. The projected r1(hat) values are then utilised to calculate the residual r2. The technique is continued until all N trees in the ensemble have been trained.

This approach employs an essential parameter called as shrinkage.

Shrinkage refers to the fact that after multiplying the prediction of each tree in the ensemble by the learning rate (eta), which varies from 0 to 1, the forecast of each tree in the ensemble is shrunk. There is a trade-off between eta and the number of estimators; a decrease in learning rate must be compensated by an increase in estimators in order to achieve a specific level of model performance. Predictions may now be made because all trees have been taught. Each tree predicts a label, and the formula provides the final forecast.

y(pred) = y1 + (eta *  r1) + (eta * r2) + ....... + (eta * rN)

GradientBoostingRegressor is the Scikit-Learn class for gradient boosting regression. GradientBoostingClassifier is a classification algorithm that uses a similar approach.

image (44).png

How does gradient descent works?

The basic idea behind gradient descent is to iteratively adjust the model parameters in the direction of steepest descent of the cost function until the minimum is reached.

Here is a step-by-step explanation of how gradient descent works:

Initialization: The first step is to initialize the model parameters with some random values. This could be a vector of zeros, or some other small random values.
Calculate the error: The next step is to evaluate the error or cost function of the model on the training data. This gives us a measure of how well the model is performing.
Calculate the gradient: The gradient of the cost function is the vector of partial derivatives with respect to each of the model parameters. This tells us the direction of steepest ascent of the cost function.
Update the parameters: We update the model parameters by subtracting a small multiple of the gradient from the current parameter values. This brings us closer to the minimum of the cost function.
Repeat: We repeat steps 2-4 until the cost function reaches a minimum or a predefined number of iterations is reached.

The size of the step taken at each iteration is called the learning rate. A high learning rate can cause the algorithm to overshoot the minimum and bounce back and forth, while a low learning rate can cause the algorithm to converge slowly.

There are variations of gradient descent, such as stochastic gradient descent and mini-batch gradient descent, which use random subsets of the training data to compute the gradient at each iteration. These methods can be more efficient for large datasets.

Python implementation

Lets use boston dataset for the demo

Use the already available dataset boston which is in sklearn

import the dataset as “from sklearn.datasets import load_boston”

The Boston housing dataset is included in the Scikit-Learn library. It can be accessed by importing the dataset from the sklearn.datasets module. The dataset contains 506 samples and 13 features. It can be used for both regression and classification tasks. It is a great dataset for practicing machine learning techniques, such as gradient boosting.

Loading...

This code uses the Gradient Boosting Regressor model from the scikit-learn library to predict the median house prices in the Boston Housing dataset. First, it imports the necessary libraries for the code. Then, it loads the Boston Housing dataset from the scikit-learn library. Next, it splits the data into train and test sets. After that, it creates the Gradient Boosting Regressor model, fits it to the training data, and uses it to make predictions on the test data. Finally, it calculates the mean squared error and R2 score on the test data and prints the results.

Conclusion

Gradient Boosting is a powerful and popular ensemble learning technique for both classification and regression tasks. It combines multiple weak learners into a single strong learner by sequentially optimizing the model performance. Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost. Hyperparameter tuning and loss functions are important considerations when training gradient boosting models. Feature selection, model interpretation, and model ensembling techniques can also be used to improve the model performance. Gradient Boosting is a powerful technique and can be used to achieve excellent results on a variety of tasks.

Key takeaways

Gradient Boosting is an ensemble learning technique used for both classification and regression tasks.
It combines multiple weak learners to form a strong learner.
Commonly used gradient boosting algorithms include XGBoost, LightGBM, and CatBoost.
Hyperparameter tuning is an important step in optimizing the model performance.
Loss functions are the measure of how well the model is performing.
Feature selection is an important step in training gradient boosting models.
Model interpretation is the process of understanding the inner workings of a model.
Imbalanced data is a common problem in machine learning and can be handled using oversampling, undersampling, and synthetic data generation.
Model ensembling is the process of combining multiple models to create a more powerful model.
Automated machine learning (AutoML) is an emerging field that uses algorithms to automate the process of model development.
Gradient boosting libraries provide an easy way to train and deploy gradient boosting models.
Performance evaluation is an important step in the machine learning process and involves measuring the model performance and comparing it to a baseline.

Quiz

1.Which of the following is an important step in training gradient boosting models?

Hyperparameter Tuning
Feature Selection
Model Interpretation
Model Ensembling

Answer: B. Feature Selection

2.What metric is commonly used for evaluating the performance of a gradient boosting model?

F1 Score
Log Loss
Mean Absolute Error
Mean Squared Error

Answer: D. Mean Squared Error

3.Which of the following is a commonly used technique for handling imbalanced data with gradient boosting?

Oversampling
Undersampling
Grid Search
Random Search

Answer: A. Oversampling

4.What is the name of the popular library for training and deploying gradient boosting models?

XGBoost
LightGBM
CatBoost
All of the above

Answer: D. All of the Above

Module 5: Classification