# Hierarchical Clustering in Machine Learning

Hierarchical clustering could be a strategy of clustering data into groups or clusters based on their similarity. It may be a type of unsupervised learning, which implies that it does not require labeled information to create expectations.

## Introduction

In hierarchical clustering, the information is at first represented as a set of individual points, and the calculation continues by iteratively consolidating the closest sets of focuses or clusters until all the focuses have a place to a single cluster or to a indicated number of clusters.

There are two fundamental sorts of hierarchical clustering: agglomerative and divisive. The yield of hierarchical clustering is usually represented as a dendrogram, which may be a tree-like diagram that shows the various leveled connections between the clusters. The dendrogram can be utilized to distinguish the ideal number of clusters based on the structure of the tree.

## Why Hierarchical Clustering?

K-means follows an iterative process. It will keep on running until the centroids of newly formed clusters do not change or the maximum number of iterations are reached.

But there are certain challenges with K-means. It always makes spherical clusters. Also, we have to decide the number of clusters at the beginning of the algorithm. Ideally, we would not know how many clusters should we have, in the beginning of the algorithm and hence it a challenge with K-means.

Hierarchical clustering takes away the problem of having to pre-define the number of clusters. So, let’s see what hierarchical clustering is and how it improves on K-means.

### Agglomerative Clustering

Agglomerative clustering may be a sort of hierarchical clustering in which each information point begins as its claim cluster, and clusters are iteratively blended until a halting model is met.

At each iteration, the two closest clusters are merged into a new, bigger cluster. The distance between clusters can be measured employing an assortment of measurements, such as Euclidean distance, cosine similarity, or correlation coefficient.

The method proceeds until all information focuses have a place to a single cluster, or until a ceasing basis, such as a greatest number of clusters or a least remove limit, is met.

The yield of agglomerative clustering can be visualized as a dendrogram, which could be a tree-like diagram that shows the various leveled connections between the clusters. The tallness of the branches within the dendrogram speaks to the separate between clusters at each emphasis.

Agglomerative clustering could be a well known strategy for clustering information since it is simple to execute and can handle huge datasets but can be computationally expensive in case of expansive datasets.

### Divisive Clustering

Divisive clustering is a type of hierarchical clustering in which all data points start in a single cluster and clusters are recursively divided until a stopping criterion is met.

At each iteration, the cluster with the highest variance or the greatest dissimilarity among its data points is split into two smaller clusters. The splitting can be performed using various algorithms such as k-means or hierarchical clustering.

The process continues until each cluster contains only one data point, or until a stopping criterion, such as a maximum number of clusters or a minimum variance threshold, is met.

The output of divisive clustering can also be visualized as a dendrogram, but in contrast to agglomerative clustering, the dendrogram is built from the bottom up, starting with individual data points and building up to the final clusters.

Divisive clustering can produce more meaningful clusters when the data has a clear structure, but it can be computationally expensive and sensitive to the choice of algorithm used for splitting the clusters.

## How Hierarchical Clustering Calculates Distance?

In hierarchical clustering, a proximity matrix is a square matrix that stores the distances between each pair of data points. It is used to determine the similarity or dissimilarity between data points or clusters and to merge or split them during the clustering process.

Here's an example of a proximity matrix for five data points:

Let’s take a sample of 5 students:

### Creating a Proximity Matrix

First, we will create a proximity matrix which will tell us the distance between each of these points. Since we are calculating the distance of each point from each of the other points, we will get a square matrix of shape n X n (where n is the number of observations).

Let’s make the 5 x 5 proximity matrix for our example:

The diagonal elements of this matrix will always be 0 as the distance of a point with itself is always 0. We will use the Euclidean distance formula to calculate the rest of the distances. So, let’s say we want to calculate the distance between point 1 and 2:

√(10−7)2=√9=3

Similarly, we can calculate all the distances and fill the proximity matrix.

### Steps for Hierarchical Clustering

1. Calculate the proximity matrix: Calculate the distance or similarity measure between each pair of data points and store the values in a proximity matrix.
2. Initialize the clusters: At the beginning of the clustering process, each data point is treated as a separate cluster.

1. Determine the closest clusters: Identify the two closest clusters based on the values in the proximity matrix. This can be done by finding the minimum value in the matrix.

1. Merge the closest clusters: Merge the two closest clusters into a single cluster. Update the proximity matrix to reflect the new distances between the merged cluster and the remaining clusters.

1. Repeat steps 3-4: Repeat steps 3-4 until all data points are in a single cluster or until a stopping criterion is met.
2. Visualize the dendrogram: The clustering hierarchy can be visualized as a dendrogram, which shows the order in which the clusters were merged.

1. Determine the number of clusters: Determine the number of clusters based on the dendrogram or by setting a threshold for the distance between clusters.

These steps apply to agglomerative clustering, which is the most common type of hierarchical clustering. Divisive clustering, on the other hand, works by recursively dividing the data points into smaller clusters until a stopping criterion is met.

## Linkages Used in Hierarchical Clustering

Linkage refers to the criterion used to determine the distance between clusters in hierarchical clustering. Here are some commonly used linkage methods:

1. Single linkage: Also known as nearest-neighbor linkage, this method calculates the distance between the closest points of the two clusters being merged.
2. Complete linkage: Also known as farthest-neighbor linkage, this method calculates the distance between the farthest points of the two clusters being merged.
3. Average linkage: This method calculates the average distance between all pairs of points in the two clusters being merged.
4. Ward's linkage: This method minimizes the variance of the distances between all pairs of points within the newly formed cluster. This results in clusters that are relatively homogeneous and compact.
5. Centroid linkage: This method calculates the distance between the centroids of the two clusters being merged.

The choice of linkage method can have a significant impact on the resulting clusters. For example, single linkage tends to produce elongated clusters, while complete linkage produces more compact clusters. Ward's linkage tends to produce well-separated and evenly sized clusters. Therefore, it is important to choose the linkage method that best fits the characteristics of the data being clustered.

## Conclusion

Hierarchical clustering is an unsupervised machine learning technique that groups data points into clusters based on their similarity. Unlike K-means, it does not require the number of clusters to be pre-defined, and it can handle large datasets. There are two main types of hierarchical clustering: agglomerative and divisive. Both methods use a proximity matrix to calculate the distances between data points or clusters.

## Key Takeaways

1. Hierarchical clustering is an unsupervised learning method that clusters data points into groups or clusters based on their similarity.
2. It is performed by merging the closest pairs of points or clusters iteratively until all points belong to a single cluster or to a specified number of clusters.
3. Agglomerative clustering is a popular method that starts with each data point as its own cluster and iteratively merges the two closest clusters until all data points belong to a single cluster.
4. Divisive clustering is a method that starts with all data points in a single cluster and recursively divides the clusters until each cluster contains only one data point.
5. The output of both methods is a dendrogram, which is a tree-like diagram that shows the hierarchical relationships between the clusters.
6. The dendrogram can be used to visualize the clustering results and identify the optimal number of clusters based on the structure of the tree.
7. Hierarchical clustering can handle large datasets and does not require pre-defining the number of clusters, unlike K-means clustering.
8. A proximity matrix is used to determine the similarity or dissimilarity between data points or clusters and to merge or split them during the clustering process.

## Quiz

1. What is hierarchical clustering?
1. A method of clustering data points into groups or clusters based on their similarity
2. A method of clustering data points based on labels
3. A supervised learning technique
4. A technique that requires pre-defined number of clusters

Answer: a. A method of clustering data points into groups or clusters based on their similarity

1. What are the two main types of hierarchical clustering?
1. K-means and agglomeration
2. Divisive and hierarchical
3. Agglomerative and divisive
4. Supervised and unsupervised

Answer: c. Agglomerative and divisive

1. What is the stopping criterion in hierarchical clustering?
1. When all data points belong to a single cluster
2. When a specified number of clusters is reached
3. A minimum distance threshold
4. All of the above

Answer: d. All of the above

1. What is the output of agglomerative clustering?
1. A tree-like diagram that shows the hierarchical relationships between the clusters
2. A square matrix that stores the distances between each pair of data points
3. Spherical clusters
4. A pre-defined number of clusters

Answer: a. A tree-like diagram that shows the hierarchical relationships between the clusters

1. What is the difference between agglomerative and divisive clustering?
1. Agglomerative clustering starts with all data points in a single cluster, whereas divisive clustering starts with each data point as its own cluster.
2. Agglomerative clustering merges the closest pairs of points or clusters, whereas divisive clustering recursively divides clusters.
3. Agglomerative clustering produces more meaningful clusters when the data has a clear structure, whereas divisive clustering is sensitive to the choice of distance metric.
4. Agglomerative clustering builds the dendrogram from the bottom up, starting with individual data points, whereas divisive clustering builds the dendrogram from the top down.

Answer: b. Agglomerative clustering merges the closest pairs of points or clusters, whereas divisive clustering recursively divides clusters.

Module 7: Unsupervised LearningHierarchical Clustering in Machine Learning

Top Tutorials

Related Articles

Made with in Bengaluru, India