Seaborn Count Plot
This article discusses the Seaborn count plot and the difference between the count plot and a bar plot. We will also look at available Python options for Seaborn’s
countplot() Function in Seaborn
countplot() is a way to count the number of observations you have per category and then display that information in bars. You may consider it a histogram, but for categorical data, it’s a very simple plot and very useful, especially when doing exploratory data analysis in Python.
Check out the
countplot() function in the Seaborn library. First, we will import the Seaborn library and load some data from the Seaborn library about diamonds.
import seaborn as sb Data_DM = sb.load_dataset('diamonds') Data_DM.head()
Each row of this data set contains information about one particular diamond.
We will narrow it down using
VS2 so we have a category with only two options.
Data_DM = Data_DM[Data_DM.clarity.isin(['SI1', 'VS2'])] Data_DM.shape
Once we narrow everything down, we have got about 25323 different diamonds in this data set.
Now we are ready to create our first count plot. To do that, we will reference the Seaborn library, call up the
countplot() function, and pass what column we would like to plot.
We will be plotting the
color column, and these data come from our
What this does with this plot is count the number of observations we have for each category it finds in the
color column. For example, Seaborn found about 1500 diamonds with a color equal to
If we applied
value_counts() to the
These numbers are what we plot when we use the
D 3780 E 4896 F 4332 G 4323 H 3918 I 2593 J 1481 Name: color, dtype: int64
One nice thing about the Seaborn
countplot() is that we can easily switch from vertical to horizontal bars. All we need to do is switch this
x into a
Seaborn Barplot vs. Countplot
So at this point, you may think that the Seaborn
countplot looks very similar to the
barplot. But, there is one really big difference: with the Seaborn
countplot, we are just counting the number of observations per category.
With the Seaborn
barplot, we get an estimate for some summary statistics per category. For example, we might have the average per category and get the confidence intervals from this; that is why a barplot is used.
The Order Argument
They are used for two different things; however, the coding options are available in both plots. Let’s check out some of those options in the Seaborn code.
For the first option, let’s talk about the order in those bars that appear in the above plot. If we look at our
countplot for the color of those diamonds, we will see that the bars are not currently sorted based on most popular to least popular.
They are alphabetically lined up from
But, if we look at another column called
cut, we will see that the bars are no longer arranged alphabetically.
It is not clear at first how Seaborn is arranging these bars; we can walk through the process. We look at the data types of
diamonds columns and notice that we have several float64, int64, and categories.
These three columns are considered the category data types.
clarity are all categories.
carat float64 cut category color category clarity category depth float64 table float64 price int64 x float64 y float64 z float64 dtype: object
Let’s see what it means. To check the
color, we have this property called
This is what Seaborn is using to line up those bars.
Index(['D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='object')
category columns will come with this property called
categories, and Seaborn will use this to figure out how it should line up those bars.
Index(['Ideal', 'Premium', 'Very Good', 'Good', 'Fair'], dtype='object')
In the first one, we are lining up alphabetically, but in the second one, we are lining up based on the best diamonds first and down to the worst diamonds.
But what if that
category’s order is not how we would like those bars to appear? The Seaborn
countplot() function has an argument called
order, and we can pass a list of how we would like to order those bars.
ord_of_c=['J', 'I', 'H', 'G', 'F', 'E', 'D'] sb.countplot(x='color', data=Data_DM, order=ord_of_c)
We can also sort these bars in ascending or descending order since this is a Pandas dataframe, so we recommend using the
value_counts() method. This will sort our bars from the most popular to the least popular.
If we go ahead and grab the index, we would see the most popular category is
E and down to the least popular category,
CategoricalIndex(['E', 'F', 'G', 'H', 'D', 'I', 'J'], categories=['D', 'E', 'F', 'G', 'H', 'I', 'J'], ordered=False, dtype='category')
We can use this
index when we create our order for our bars. Now we have these sorted in descending.
But if we prefer to have them sorted ascending.
All we need to do is reverse this index which we can do with two colons and a negative one that will switch the index completely around.
You can find more options when you visit here.
# In: import seaborn as sb Data_DM = sb.load_dataset('diamonds') Data_DM.head() # In: Data_DM = Data_DM[Data_DM.clarity.isin(['SI1', 'VS2'])] Data_DM.shape # In: sb.countplot(x='color',data=Data_DM) # In: Data_DM.color.value_counts(sort=False) # In: sb.countplot(y='color',data=Data_DM) # In: order argument sb.countplot(x='cut', data=Data_DM) # In: Data_DM.dtypes # In: Data_DM.color.cat.categories # In: Data_DM.cut.cat.categories # In: ord_of_c=['J', 'I', 'H', 'G', 'F', 'E', 'D'] sb.countplot(x='color', data=Data_DM, order=ord_of_c) # In: Data_DM.color.value_counts().index # In: sb.countplot(x='color', data=Data_DM,order=Data_DM.color.value_counts().index[::-1])
Related Article - Seaborn Plot
- Seaborn Histogram Plot
- Create a 3D Plot Using Seaborn and Matplotlib
- Create Linear Regression in Seaborn
- Create a Contour Plot in Seaborn
- Seaborn Joint Plot