How to Count Unique Values Per Group(s) in Pandas

Ahmed Waheed Feb 02, 2024

When we are working with large data sets, sometimes we have to apply some function to a specific group of data. For example, we have a data set of `countries` and the private `code` they use for private matters. We want to count the number of codes a country uses. Listed below are the different methods from `groupby()` to count unique values.

We will use the same `DataFrame` in the next sections as follows,

``````import pandas as pd

data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
print(df)
``````

The following will be the output.

``````   code    Countries
0   999  Switzerland
1   113  Switzerland
2   112        Japan
3   112  Switzerland
5   114        Japan
6   100      Germany
7   114        Japan
8   115      Germany
``````

`df.groupby().nunique()` Method

Let’s see how `df.groupby().nunique()` function will `groupby` our countries.

``````import pandas as pd

data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]

df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby("Countries")["code"].nunique()
print(result)
``````

The following will be output.

``````Countries
Germany        2
Japan          2
Switzerland    3
Name: code, dtype: int64
``````

This shows that Canada is using one code, Germany is using two codes, and so on.

`df.groupby().agg()` Method

This method works same as `df.groupby().nunique()`. We need pass `nunique()` function to `agg()` function.

``````import pandas as pd

data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby(by="Countries", as_index=False).agg({"code": pd.Series.nunique})
print(result)
``````

Output:

``````     Countries  code
1      Germany     2
2        Japan     2
3  Switzerland     3
``````
``````.agg({'code': pd.Series.nunique})
``````

It gggregates using function `pd.Series.nunique` over the column `code`.

`df.groupby().unique()` Method

This method is useful when you want to see which country is using which codes.

``````import pandas as pd

data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
result = df.groupby("Countries")["code"].unique()
print(result)
``````

The following will be output.

``````Countries