Count Unique Values Per Group(s) in Pandas

  1. df.groupby().nunique() Method
  2. df.groupby().agg() Method
  3. df.groupby().unique() Method

When we are working with large data sets, sometimes we have to apply some function to a specific group of data. For example, we have a data set of countries and the private code they use for private matters. We want to count the number of codes a country uses. Listed below are the different ways count unique values.

  1. df.groupby().nunique() Method
  2. df.groupby().agg() Method
  3. df.groupby().unique() Method

We will use the same DataFrame in the next sections as follows,

import pandas as pd
data = [
    [999,
     'Switzerland'],
    [113,
     'Switzerland'],
    [112,
     'Japan'],
    [112,
     'Switzerland'],
    [113,
     'Canada'],
    [114,
     'Japan'],
    [100,
     'Germany'],
    [114,
     'Japan'],
    [115,
     'Germany']
]
df = pd.DataFrame(data, columns=["code","Countries"])
print(df)

The following will be the output.

   code    Countries
0   999  Switzerland
1   113  Switzerland
2   112        Japan
3   112  Switzerland
4   113       Canada
5   114        Japan
6   100      Germany
7   114        Japan
8   115      Germany

df.groupby().nunique() Method

Let’s see how df.groupby().nunique() function will groupby our countries.

import pandas as pd
data = [
    [999,
     'Switzerland'],
    [113,
     'Switzerland'],
    [112,
     'Japan'],
    [112,
     'Switzerland'],
    [113,
     'Canada'],
    [114,
     'Japan'],
    [100,
     'Germany'],
    [114,
     'Japan'],
    [115,
     'Germany']
]

df = pd.DataFrame(data, columns=["code","Countries"])
result = df.groupby('Countries')['code'].nunique()
print(result)

The following will be output.

Countries
Canada         1
Germany        2
Japan          2
Switzerland    3
Name: code, dtype: int64

This shows that Canada is using one code, Germany is using two codes, and so on.

df.groupby().agg() Method

This method works same as df.groupby().nunique(). We need pass nunique() function to agg() function.

import pandas as pd
data = [
    [999,
     'Switzerland'],
    [113,
     'Switzerland'],
    [112,
     'Japan'],
    [112,
     'Switzerland'],
    [113,
     'Canada'],
    [114,
     'Japan'],
    [100,
     'Germany'],
    [114,
     'Japan'],
    [115,
     'Germany']
]
df = pd.DataFrame(data, columns=["code","Countries"])
result = df.groupby(by='Countries', as_index=False).agg({'code': pd.Series.nunique})
print(result)

Output:

     Countries  code
0       Canada     1
1      Germany     2
2        Japan     2
3  Switzerland     3
.agg({'code': pd.Series.nunique})

It gggregates using function pd.Series.nunique over the column code.

df.groupby().unique() Method

This method is useful when you want to see which country is using which codes.

import pandas as pd
data = [
    [999,
     'Switzerland'],
    [113,
     'Switzerland'],
    [112,
     'Japan'],
    [112,
     'Switzerland'],
    [113,
     'Canada'],
    [114,
     'Japan'],
    [100,
     'Germany'],
    [114,
     'Japan'],
    [115,
     'Germany']
]
result = df.groupby('Countries')["code"].unique()
print(result)

The following will be output.

Countries
Canada                   [113]
Germany             [100, 115]
Japan               [112, 114]
Switzerland    [999, 113, 112]
Name: code, dtype: object
comments powered by Disqus