Scipy Cluster Hierarchy Dendrogram Function

  1. Syntax of scipy.cluster.hierarchy.linkage
  2. Example of Hierarchical Clustering Dendrogram

Hierarchical clustering, also known as hierarchical clustering analysis follows a top to bottom approach for grouping objects that are of the same type into groups known as clusters. This is an unsupervised machine learning algorithm in which all the groups or clusters are different from each other.

There are 2 types of hierarchical clustering:

  • Divisive Clustering - In this method, all the objects are grouped into one cluster at first and then all the objects are divided into two clusters that have very few similarities. This method follows top to bottom approach.
  • Agglomerative Clustering - In this method, objects are grouped in a cluster of their own. This method follows bottom to top approach.

A Dendrogram is a visualisation diagram that is more of a tree-like diagram that helps to describe the relationship between all the predefined clusters. The most basic methodology of a dendrogram is that, farther the distance between the lines of the dendrogram, the more is the distance between all the clusters.

Syntax of scipy.cluster.hierarchy.linkage

scipy.cluster.hierarchy.dendrogram(Z,
                                    p = 30,
                                    truncate_mode = None,
                                    orientation = 'top')

Parameters

Z It represents the linkage matrix that is used to encode the whole hierarchical clustering to define it as a dendrogram.
p It is the parameter defined for truncate_mode
truncate_mode Due to the large original observation matrix from which the linkage between the clusters is defined, the dendrogram can be hard to study. This parameter helps to make the dendrogram compact.
orientation It decides that in which direction the dendrogram is being plotted. For example, top. The top orientation means the base of the dendrogram is at the top and the links are going downwards. Similarly, other orientations are bottom, left, and right.

All these parameters are optional except the Z parameter. Also, there are many more optional parameters in this function like color_threshold, get_leaves, distance_sort, etc.

Example of Hierarchical Clustering Dendrogram

import numpy as np
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt

array = np.array([30, 60, 90, 120, 150, 180, 210, 240, 270, 300])

clus = hierarchy.linkage(array, 'complete')
plt.figure()
  
den = hierarchy.dendrogram(clus, above_threshold_color="black", color_threshold = 0.8,  orientation='right')

Output:

scipy cluster hierarchy dendrogram.png

Here, note that we have used the complete linkage algorithm to do the hierarchical clustering. Also, the base of the dendrogram is at the right hand side and the links are falling towards the left because the orientation parameter is defined as right.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.