Pandas DataFrame 的唯一值计数

Suraj Joshi 2023年1月30日
  1. 使用 Series.value_counts() 计算 DataFrame 中的唯一值
  2. 使用 DataFrame.nunique() 计算 DataFrame 中的唯一值
Pandas DataFrame 的唯一值计数

本教程解释了如何使用 Series.value_counts()DataFrame.nunique() 方法获得 DataFrame 中所有唯一值的计数。

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df)

输出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

我们将使用 DataFrame patients_df,其中包含患者的姓名、预约日期和年龄,来解释如何获得 DataFrame 中所有唯一值的计数。

使用 Series.value_counts() 计算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print("The DataFrame is:")
print(patients_df, "\n")

print("No of appointments for each date:")
print(patients_df["Date"].value_counts())

输出:

The DataFrame is:
       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

No of appointments for each date:
2020-12-02    3
2020-12-01    2
2020-12-03    1
Name: Date, dtype: int64

它显示 DataFrame 中 Date 列的每个唯一值的计数。

使用 DataFrame.nunique() 计算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df, "\n")

print(patients_df.groupby("Date").Name.nunique())

输出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

Date
2020-12-01    2
2020-12-02    3
2020-12-03    1
Name: Name, dtype: int64

它根据 Date 列的值将 DataFrame 分割开来,即把 Date 值相同的行放在同一组,然后计算每一个名字在某一组中的出现次数,以了解 DataFrame 中 Date 列的每一个唯一值的数量。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn