Customer churn is a critical problem for any business as it can significantly impact revenue and growth. To overcome this challenge, companies need to identify the customers who are more likely to churn and take proactive measures to retain them. One effective way to achieve this is through customer churn prediction using machine learning.
In this article, we will explore the process of customer churn analysis and demonstrate how to predict customer churn with machine learning in Python. We will cover data preparation, feature engineering, model training, and evaluation to help you develop an accurate churn prediction model.
Requirements:
To perform customer churn prediction using machine learning in Python, there are a few requirements that need to be fulfilled.
Dataset: A dataset containing customer data is required to perform customer churn analysis and prediction. The dataset should include features such as customer demographics, transaction history, and interactions with the company.
Python: Python is the programming language used for this task. Therefore, you need to have Python installed on your system. Python can be downloaded from the official Python website and installed on your computer.
Machine learning libraries: Python has several machine learning libraries, such as scikit-learn, TensorFlow, and Keras, that can be used to build machine learning models.
Data preprocessing libraries: Before training the machine learning model, the data needs to be preprocessed to clean and transform it into a format suitable for the model.
Integrated Development Environment (IDE): To write and execute Python code, you need an Integrated Development Environment (IDE). There are several IDEs available for Python, such as PyCharm, Spyder, and Jupyter Notebook.
By fulfilling these requirements, we can perform customer churn prediction using machine learning in Python.
Required Modules:
To perform customer churn prediction using machine learning in Python, we need to import several modules that provide various tools and algorithms for data preprocessing, model training, and evaluation. Here are the required modules for this task:
Pandas: Pandas is a Python library used for data manipulation and analysis. It provides data structures for efficiently storing and manipulating large datasets.
Numpy: NumPy is a Python library used for numerical operations such as linear algebra, Fourier transform, and random number generation. It provides efficient and optimized routines for numerical computations.
Scikit-learn: Scikit-learn is a Python library used for machine learning tasks such as classification, regression, and clustering. It provides various algorithms and tools for data preprocessing, model training, and evaluation.
Matplotlib: Matplotlib is a Python library used for data visualization. It provides tools for creating various types of plots, such as scatter plots, bar plots, and line plots.
Seaborn: Seaborn is a Python library based on matplotlib used for data visualization. It provides additional tools for creating more complex and informative plots.
Warnings: Warnings module is used to ignore the warning messages that might pop up while running the code.
We can import these modules using the import statement in Python. For example:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt import seaborn as sns import warnings warnings.filterwarnings('ignore')
Code Implementation:
Importing required libraries: We import the required libraries such as pandas, numpy, scikit-learn, matplotlib, seaborn, and warnings. We also filter out any warning messages that might pop up while running the code.
# Importing required libraries import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt import seaborn as sns import warnings warnings.filterwarnings('ignore')
Loading the dataset: We load the dataset containing customer data using the Pandas library.
Link to the dataset: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
# Loading the dataset data = pd.read_csv('customer_churn.csv')
Exploring the dataset: We print the first few rows of the dataset using the head() method to get an idea of what the data looks like.
# Exploring the dataset print(data.head())

data.head()
Data preprocessing: We preprocess the data by dropping irrelevant columns, converting categorical columns to numerical columns, converting the 'TotalCharges' column to numeric type, and handling missing values.
# Data preprocessing # Dropping irrelevant columns data = data.drop(['customerID','MultipleLines','InternetService','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies','PaymentMethod','Contract'], axis=1) # Converting categorical columns to numerical columns data['gender'] = data['gender'].map({'Female': 1, 'Male': 0}) data['Partner'] = data['Partner'].map({'Yes': 1, 'No': 0}) data['Dependents'] = data['Dependents'].map({'Yes': 1, 'No': 0}) data['PhoneService'] = data['PhoneService'].map({'Yes': 1, 'No': 0}) data['PaperlessBilling'] = data['PaperlessBilling'].map({'Yes': 1, 'No': 0}) data['Churn'] = data['Churn'].map({'Yes': 1, 'No': 0}) # Converting 'TotalCharges' column to numeric type data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce') # Handling missing values data = data.dropna()
Splitting the dataset into training and testing sets: We split the dataset into training and testing sets using the train_test_split() method from scikit-learn.
# Splitting the dataset into training and testing sets X = data.iloc[:, :-1].values y = data.iloc[:, -1].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Feature scaling: We perform feature scaling on the training and testing sets using the StandardScaler() method from scikit-learn.
# Feature scaling sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
Training the decision tree model: We train a decision tree classifier on the training set using the DecisionTreeClassifier() method from scikit-learn.
# Training the decision tree model classifier = DecisionTreeClassifier(criterion='entropy', random_state=0) classifier.fit(X_train, y_train)
Predicting the test set results: We use the trained model to predict the customer churn for the test set.
# Predicting the test set results y_pred = classifier.predict(X_test)
Evaluating the model: We evaluate the model's accuracy using the accuracy_score() method and visualize the results using the confusion_matrix() and classification_report() methods from scikit-learn.
# Evaluating the model print('Accuracy:', accuracy_score(y_test, y_pred)) print('Confusion Matrix:', confusion_matrix(y_test, y_pred)) print('Classification Report:', classification_report(y_test, y_pred))
Visualizing the results: Finally, we use the seaborn library to create a count plot of the churn data to visualize the results.
# Visualizing the results sns.countplot(x='Churn', data=data) plt.title('Customer Churn') plt.show()
Output:

report

Customer churn chart
Source Code:
# Importing required libraries import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt import seaborn as sns import warnings warnings.filterwarnings('ignore') # Loading the dataset data = pd.read_csv('customer_churn.csv') # Exploring the dataset print(data.head()) # Data preprocessing # Dropping irrelevant columns data = data.drop(['customerID'], axis=1) # Converting categorical columns to numerical columns data['gender'] = data['gender'].map({'Female': 1, 'Male': 0}) data['Partner'] = data['Partner'].map({'Yes': 1, 'No': 0}) data['Dependents'] = data['Dependents'].map({'Yes': 1, 'No': 0}) data['PhoneService'] = data['PhoneService'].map({'Yes': 1, 'No': 0}) data['PaperlessBilling'] = data['PaperlessBilling'].map({'Yes': 1, 'No': 0}) data['Churn'] = data['Churn'].map({'Yes': 1, 'No': 0}) # Converting 'TotalCharges' column to numeric type data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce') # Handling missing values data = data.dropna() # Splitting the dataset into training and testing sets X = data.iloc[:, :-1].values y = data.iloc[:, -1].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) # Feature scaling sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) # Training the decision tree model classifier = DecisionTreeClassifier(criterion='entropy', random_state=0) classifier.fit(X_train, y_train) # Predicting the test set results y_pred = classifier.predict(X_test) # Evaluating the model print('Accuracy:', accuracy_score(y_test, y_pred)) print('Confusion Matrix:', confusion_matrix(y_test, y_pred)) print('Classification Report:', classification_report(y_test, y_pred)) # Visualizing the results sns.countplot(x='Churn', data=data) plt.title('Customer Churn') plt.show()
Conclusion:
Customer churn prediction using machine learning is an important tool for businesses to identify customers who are likely to churn and take appropriate actions to retain them. In this article, we discussed the process of building a customer churn prediction model using machine learning in Python.
We started by exploring the Telco Customer Churn dataset and preprocessing the data. We then trained and evaluated several Machine Learning algorithms to find the best-performing model. Finally, we used the trained model to make predictions on new data. Following the steps outlined in this article, businesses can develop effective customer churn analysis strategies and reduce customer churn rates.

