Pandas loc vs iloc

Suraj Joshi 2023年1月30日
  1. 使用 .loc() 方法从 DataFrame 中选择指定索引和列标签的特定值
  2. 使用 .loc() 方法从 DataFrame 中选择特定的列
  3. 使用 .loc() 方法通过对列应用条件来过滤行
  4. 使用 iloc 通过索引来过滤行
  5. 从 DataFrame 中过滤特定的行和列
  6. 使用 iloc 方法从 DataFrame 中过滤行和列的范围
  7. Pandas lociloc 的比较
Pandas loc vs iloc

本教程介绍了如何使用 Python 中的 lociloc 从 Pandas DataFrame 中过滤数据。要使用 iloc 从 DataFrame 中过滤元素,我们使用行和列的整数索引,而要使用 loc 从 DataFrame 中过滤元素,我们使用行名和列名。

为了演示使用 loc 的数据过滤,我们将使用下面例子中描述的 DataFrame。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

输出:

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

使用 .loc() 方法从 DataFrame 中选择指定索引和列标签的特定值

我们可以将索引标签和列标签作为参数传递给 .loc() 方法,以提取给定索引和列标签对应的值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

在 DataFrame 中选择索引标签为 504 且列标签为 Grade 的值。.loc() 方法的第一个参数代表索引名,第二个参数是指列名。

使用 .loc() 方法从 DataFrame 中选择特定的列

我们还可以使用 .loc() 方法从 DataFrame 中过滤所需的列。我们将所需的列名列表作为第二个参数传递给 .loc() 方法来过滤指定的列。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

输出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

.loc() 的第一个参数是:,它表示 DataFrame 中的所有行。同样,我们将 ["Name", "Age"] 作为第二个参数传递给 .loc() 方法,表示只选择 DataFrame 中的 NameAge 列。

使用 .loc() 方法通过对列应用条件来过滤行

我们也可以使用 .loc() 方法过滤满足指定条件的列值的行。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

输出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

它选择了 DataFrame 中所有成绩为 A 的学生。

使用 iloc 通过索引来过滤行

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

它从 DataFrame 中过滤第 2 和第 3 行。

我们将行的整数索引作为参数传递给 iloc 方法,以便从 DataFrame 中过滤行。在这里,第二和第三行的整数索引分别是 12,因为索引从 0 开始。

从 DataFrame 中过滤特定的行和列

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它从 DataFrame 中过滤第 2、3、4 行的第一列和最后一列,即 NameGrade。我们将行的整数索引列表作为第一个参数,列的整数索引列表作为第二个参数传递给 iloc 方法。

使用 iloc 方法从 DataFrame 中过滤行和列的范围

为了过滤行和列的范围,我们可以使用列表切片,并将每行和每列的切片作为参数传递给 iloc 方法。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

它从 DataFrame 中选择第 2、3、4 行和第 1、2 列。1:4 代表索引范围从 13 的行,4 在范围内是排他性的。同理,0:2 代表索引范围从 01 的列。

Pandas lociloc 的比较

要使用 loc() 从 DataFrame 中过滤行和列,我们需要传递要过滤掉的行和列的名称。同样,我们需要传递要过滤掉的行和列的整数索引以使用 iloc() 来过滤值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame using loc:")
iloc_filtered_values = student_df.loc[[502, 503, 504], ["Name", "Age"]]
print(iloc_filtered_values)
print("")
print("Filtered values from the DataFrame using iloc:")
iloc_filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(iloc_filtered_values)
The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame using loc:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

Filtered values from the DataFrame using iloc:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它显示了我们如何使用 lociloc 从 DataFrame 中过滤相同的值。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相关文章 - Pandas Filter