How to Count the Frequency a Value Occurs in Pandas Dataframe

Ahmed Waheed Feb 02, 2024
  1. df.groupby().count() Method
  2. Series.value_counts() Method
  3. df.groupby().size() Method
How to Count the Frequency a Value Occurs in Pandas Dataframe

Sometimes when you are working with dataframe you might want to count how many times a value occurs in the column or in other words to calculate the frequency. Majorly three methods are used for this purpose. Two out of them are from the DataFrame.groupby() methods. Let us take a look at them one by one.

  1. df.groupby().count()
  2. Series.value_counts()
  3. df.groupby().size()

We will use the same DataFrame in the next sections as follows,

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
        "B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
    }
)

df.groupby().count() Method

If you want to calculate the frequency over a single column then this method is best.

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
        "B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
    }
)

freq = df.groupby(["A"]).count()
print(freq)

freq = df.groupby(["B"]).count()
print(freq)

The following will be output.

     B
A     
jim  4
sal  3
tom  2
   A
B   
a  4
b  5

Series.value_counts() Method

As every dataframe object is a collection of Series objects, this method is best used for pandas.Series object.

Now use Series.values_counts() function

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
        "B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
    }
)

freq = df["A"].value_counts()
print(freq)

freq = df["B"].value_counts()
print(freq)

The following will be output.

jim    4
sal    3
tom    2
Name: A, dtype: int64
b    5
a    4
Name: B, dtype: int64

df.groupby().size() Method

The above two methods cannot be used to count the frequency of multiple columns but we can use df.groupby().size() for multiple columns at the same time.

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["jim", "jim", "jim", "jim", "sal", "tom", "tom", "sal", "sal"],
        "B": ["a", "b", "a", "b", "b", "b", "a", "a", "b"],
    }
)

freq = df.groupby(["A", "B"]).size()
print(freq)

The following will be output.

A    B
jim  a    2
     b    2
sal  a    1
     b    2
tom  a    1
     b    1
dtype: int64

Related Article - Pandas DataFrame