Create a Word Cloud in Python

Create a Word Cloud in Python

A word cloud is a visualization technique to plot the words or tags from a dataset. All words are clustered together in a word cloud, and their prominence is determined by different factors like word size and color.

We can make word clouds based on different criteria, but the most common word clouds are based on the frequency of words.

It should be noted that a word cloud should have a significant number of words to be substantial. However, an excess of words may be confusing due to their clustering.

Word clouds help analyze customer feedback, trend topics, and more. This tutorial will demonstrate how to create a word cloud in Python.

Create a Word Cloud in Python

We will create a simple word cloud in Python based on the frequency of words. We will scrap a Wikipedia page using the Wikipedia module for the data in our example.

We can specify the page title in the wikipedia.page() function, and we will retrieve the data with the content attribute.

This data is cleaned for all punctuations and other characters using the re.sub() function. All the occurrences of such characters will be replaced with an empty string.

Python’s Wordcloud module can create simple word clouds. We can create an object using this module’s WordCloud constructor.

This object will be plotted on a Matplotlib figure.

While creating the object, we will specify the different parameters for the word cloud. The color scheme for the words is set using the colormap parameter.

The background parameter mentions the background color of the figure. We also provide the dimensions of the plot with the height and width parameters.

The text data can contain irrelevant stop words in the word cloud. We will remove the stop words by using the stopwords parameter to provide the words to be ignored.

The Wordcloud module has the STOPWORDS constant, which contains these words and is provided as the value for this parameter.

The generate() function will take the dataset and apply it to the WordCloud object. Finally, we will use the imshow() function from the matplotlib library to display the final image.

See the code below.

import wikipedia
import re
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

raw = wikipedia.page('Python (programming language)')
text = raw.content
data = re.sub(r'==.*?==+', '', text)
data = data.replace('\n', '')

word_cloud = WordCloud(width= 3500, height = 2500, random_state=1, background_color='black', colormap='Set1', collocations=False, stopwords = STOPWORDS).generate(text)
plt.figure(figsize=(50, 30))
plt.imshow(word_cloud)
plt.axis("off");

Output:

Wordcloud

We were able to create a simple word cloud in the above example. The plt.axis('off') function removes the axis from the final figure.

Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn