Pandas の loc と iloc の比較

Suraj Joshi 2023年1月30日 Pandas Pandas Filter

インデックスと列のラベルを指定してデータフレームから特定の値を選択する .loc() メソッドを用いた
.loc() メソッドを用いて DataFrame から特定のカラムを選択する
.loc() メソッドを使用してカラムに条件を適用して行をフィルタリングする
iloc を用いたインデックスを持つ行のフィルタリング
DataFrame から特定の行や列をフィルタリングする
iloc を用いた DataFrame からの行と列のフィルタリング範囲
Pandas の loc と iloc の比較

このチュートリアルでは、Python の loc と iloc を使って Pandas DataFrame からデータをフィルタリングする方法を説明します。iloc を使って DataFrame のエントリをフィルタリングするには行と列に整数インデックスを使い、loc を使って DataFrame のエントリをフィルタリングするには行と列の名前を使います。

loc を用いたデータフィルタリングを実演するために、以下の例で説明する DataFrame を使用します。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

出力：

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

インデックスと列のラベルを指定してデータフレームから特定の値を選択する `.loc()` メソッドを用いた

インデックスラベルとカラムラベルを .loc() メソッドの引数に渡すことで、与えられたインデックスとカラムラベルに対応する値を抽出できます。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

出力：

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

インデックスラベルが 504、カラムラベルが Grade の場合、DataFrame 内の値を選択します。.loc() メソッドの第 1 引数はインデックス名を、第 2 引数はカラム名を表します。

`.loc()` メソッドを用いて DataFrame から特定のカラムを選択する

また、.loc() メソッドを用いて DataFrame から必要なカラムをフィルタリングすることもできます。必要なカラム名のリストを第 2 引数として .loc() メソッドに渡し、指定したカラムをフィルタリングします。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

出力：

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

.loc() メソッドの第 1 引数は : であり、これは DataFrame 内の全行を表します。同様に、["Name", "Age"] を .loc() メソッドの第二引数に渡します。これは、データフレームから Name と Age のカラムのみを選択することを表します。

`.loc()` メソッドを使用してカラムに条件を適用して行をフィルタリングする

また、.loc() メソッドを用いて指定された条件を満たす列の値をフィルタリングすることもできます。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

出力：

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

データフレーム内のすべての学生のうち、成績 A の学生を選択します。

`iloc` を用いたインデックスを持つ行のフィルタリング

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

出力：

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

DataFrame から 2 行目と 3 行目をフィルタリングします。

行の整数インデックスを引数として iloc メソッドに渡し、DataFrame から行をフィルタリングします。ここでは、インデックスは 0 から始まるので、2 行目と 3 行目の整数インデックスはそれぞれ 1 と 2 となります。

DataFrame から特定の行や列をフィルタリングする

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

出力：

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

DataFrame の 2 行目、3 行目、4 行目の最初と最後の列、つまり Name と Grade をフィルタリングします。第 1 引数に行の整数インデックスを含むリストを、第 2 引数に列の整数インデックスを含むリストを iloc メソッドに渡します。

`iloc` を用いた DataFrame からの行と列のフィルタリング範囲

行と列の範囲をフィルタリングするには、リストスライスを利用し、各行と列のスライスを iloc メソッドの引数として渡すことができます。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

出力：

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

データフレームから 2 行目、3 行目、4 行目の行と 1st、2nd の列を選択します。1:4 は 1 から 3 までの範囲のインデックスを持つ行を表し、4 は範囲内で排他的です。同様に、0:2 は 0 から 1 までの範囲のインデックスを持つ列を表します。