Hierarchical clustering, also known as hierarchical clustering analysis follows a top to bottom approach for grouping objects that are of the same type into groups known as clusters. This is an unsupervised machine learning algorithm in which all the groups or clusters are different from each other.
There are 2 types of hierarchical clustering:
- Divisive Clustering - In this method, all the objects are grouped into one cluster at first and then all the objects are divided into two clusters that have very few similarities. This method follows top to bottom approach.
- Agglomerative Clustering - In this method, objects are grouped in a cluster of their own. This method follows bottom to top approach.
A Dendrogram is a visualisation diagram that is more of a tree-like diagram that helps to describe the relationship between all the predefined clusters. The most basic methodology of a dendrogram is that, farther the distance between the lines of the dendrogram, the more is the distance between all the clusters.
scipy.cluster.hierarchy.dendrogram(Z, p = 30, truncate_mode = None, orientation = 'top')
||It represents the linkage matrix that is used to encode the whole hierarchical clustering to define it as a dendrogram.|
||It is the parameter defined for
||Due to the large original observation matrix from which the linkage between the clusters is defined, the dendrogram can be hard to study. This parameter helps to make the dendrogram compact.|
||It decides that in which direction the dendrogram is being plotted. For example,
All these parameters are optional except the
Z parameter. Also, there are many more optional parameters in this function like
Example of Hierarchical Clustering Dendrogram
import numpy as np from scipy.cluster import hierarchy import matplotlib.pyplot as plt array = np.array([30, 60, 90, 120, 150, 180, 210, 240, 270, 300]) clus = hierarchy.linkage(array, 'complete') plt.figure() den = hierarchy.dendrogram(clus, above_threshold_color="black", color_threshold = 0.8, orientation='right')
Here, note that we have used the
complete linkage algorithm to do the hierarchical clustering. Also, the base of the dendrogram is at the right hand side and the links are falling towards the left because the
orientation parameter is defined as