A word cloud is a visualization technique to plot the words or tags from a dataset. All words are clustered together in a word cloud, and their prominence is determined by different factors like word size and color.
We can make word clouds based on different criteria, but the most common word clouds are based on the frequency of words.
It should be noted that a word cloud should have a significant number of words to be substantial. However, an excess of words may be confusing due to their clustering.
Word clouds help analyze customer feedback, trend topics, and more. This tutorial will demonstrate how to create a word cloud in Python.
Create a Word Cloud in Python
We will create a simple word cloud in Python based on the frequency of words. We will scrap a Wikipedia page using the Wikipedia module for the data in our example.
We can specify the page title in the
wikipedia.page() function, and we will retrieve the data with the
This data is cleaned for all punctuations and other characters using the
re.sub() function. All the occurrences of such characters will be replaced with an empty string.
Python’s Wordcloud module can create simple word clouds. We can create an object using this module’s
This object will be plotted on a Matplotlib figure.
While creating the object, we will specify the different parameters for the word cloud. The color scheme for the words is set using the
background parameter mentions the background color of the figure. We also provide the dimensions of the plot with the
The text data can contain irrelevant stop words in the word cloud. We will remove the stop words by using the
stopwords parameter to provide the words to be ignored.
The Wordcloud module has the
STOPWORDS constant, which contains these words and is provided as the value for this parameter.
generate() function will take the dataset and apply it to the
WordCloud object. Finally, we will use the
imshow() function from the
matplotlib library to display the final image.
See the code below.
import wikipedia import re from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt raw = wikipedia.page('Python (programming language)') text = raw.content data = re.sub(r'==.*?==+', '', text) data = data.replace('\n', '') word_cloud = WordCloud(width= 3500, height = 2500, random_state=1, background_color='black', colormap='Set1', collocations=False, stopwords = STOPWORDS).generate(text) plt.figure(figsize=(50, 30)) plt.imshow(word_cloud) plt.axis("off");
We were able to create a simple word cloud in the above example. The
plt.axis('off') function removes the axis from the final figure.