Python 中的 Pandas 插入方法

Suraj Joshi 2023年1月30日
  1. pandas.DataFrame.insert() 方法
  2. insert() 方法中設定 allow_duplicates = True 來新增已經存在的列
Python 中的 Pandas 插入方法

本教程解釋瞭如何使用 insert() 方法在 Pandas DataFrame 中插入一列。

import pandas as pd

countries_df = pd.DataFrame(
    {
        "Country": ["Nepal", "Switzerland", "Germany", "Canada"],
        "Continent": ["Asia", "Europe", "Europe", "North America"],
        "Primary Language": ["Nepali", "French", "German", "English"],
    }
)
print("Countries DataFrame:")
print(countries_df, "\n")

輸出:

Countries DataFrame:
       Country      Continent Primary Language
0        Nepal           Asia           Nepali
1  Switzerland         Europe           French
2      Germany         Europe           German
3       Canada  North America          English

我們將使用上例中所示的 countries_df DataFrame 來解釋如何使用 insert() 方法在 Pandas DataFrame 中插入一列。

pandas.DataFrame.insert() 方法

語法

DataFrame.insert(loc, column, value, allow_duplicates=False)

它將名為 column 的列插入到 DataFrame 中,其值由 value 指定,位於 loc 位置。

使用 insert() 方法插入對所有行具有相同值的列

import pandas as pd

countries_df = pd.DataFrame(
    {
        "Country": ["Nepal", "Switzerland", "Germany", "Canada"],
        "Continent": ["Asia", "Europe", "Europe", "North America"],
        "Primary Language": ["Nepali", "French", "German", "English"],
    }
)
print("Countries DataFrame:")
print(countries_df, "\n")

countries_df.insert(3, "Capital", "Unknown")

print("Countries DataFrame after inserting Capital column:")
print(countries_df)

輸出:

Countries DataFrame:
       Country      Continent Primary Language
0        Nepal           Asia           Nepali
1  Switzerland         Europe           French
2      Germany         Europe           German
3       Canada  North America          English

Countries DataFrame after inserting Capital column:
       Country      Continent Primary Language  Capital
0        Nepal           Asia           Nepali  Unknown
1  Switzerland         Europe           French  Unknown
2      Germany         Europe           German  Unknown
3       Canada  North America          English  Unknown

它在 countries_df DataFrame 的 3 索引位置插入 ·Capital·列,所有行的 ·Capital·列值均設定為 Unknown

該位置從 0 開始,因此 3 位置指的是 DataFrame 中的 4 列。

在 DataFrame 中插入一列,指定每行的值

如果我們想使用 insert() 方法為要插入的列指定每一行的值,我們可以在 insert() 方法中傳遞一個值列表作為 value 引數。

import pandas as pd

countries_df = pd.DataFrame(
    {
        "Country": ["Nepal", "Switzerland", "Germany", "Canada"],
        "Continent": ["Asia", "Europe", "Europe", "North America"],
        "Primary Language": ["Nepali", "French", "German", "English"],
    }
)
print("Countries DataFrame:")
print(countries_df, "\n")

capitals = ["Kathmandu", "Zurich", "Berlin", "Ottawa"]

countries_df.insert(2, "Capital", capitals)

print("Countries DataFrame after inserting Capital column:")
print(countries_df)

輸出:

Countries DataFrame:
       Country      Continent Primary Language
0        Nepal           Asia           Nepali
1  Switzerland         Europe           French
2      Germany         Europe           German
3       Canada  North America          English

Countries DataFrame after inserting Capital column:
       Country      Continent    Capital Primary Language
0        Nepal           Asia  Kathmandu           Nepali
1  Switzerland         Europe     Zurich           French
2      Germany         Europe     Berlin           German
3       Canada  North America     Ottawa          English

它在 DataFrame countries_df 中的索引 2 插入了列 Capital,併為 DataFrame 中的 Capital 列指定了每一行的值。

insert() 方法中設定 allow_duplicates = True 來新增已經存在的列

import pandas as pd

countries_df = pd.DataFrame(
    {
        "Country": ["Nepal", "Switzerland", "Germany", "Canada"],
        "Continent": ["Asia", "Europe", "Europe", "North America"],
        "Primary Language": ["Nepali", "French", "German", "English"],
        "Capital": ["Kathmandu", "Zurich", "Berlin", "Ottawa"],
    }
)
print("Countries DataFrame:")
print(countries_df, "\n")

capitals = ["Kathmandu", "Zurich", "Berlin", "Ottawa"]

countries_df.insert(4, "Capital", capitals, allow_duplicates=True)

print("Countries DataFrame after inserting Capital column:")
print(countries_df)

輸出:

Countries DataFrame:
       Country      Continent Primary Language    Capital
0        Nepal           Asia           Nepali  Kathmandu
1  Switzerland         Europe           French     Zurich
2      Germany         Europe           German     Berlin
3       Canada  North America          English     Ottawa

Countries DataFrame after inserting Capital column:
       Country      Continent Primary Language    Capital    Capital
0        Nepal           Asia           Nepali  Kathmandu  Kathmandu
1  Switzerland         Europe           French     Zurich     Zurich
2      Germany         Europe           German     Berlin     Berlin
3       Canada  North America          English     Ottawa     Ottawa

它將列 Capital 新增到 countries_df DataFrame 中,儘管 countries_df DataFrame 中已經存在 Capital 列。

如果我們嘗試插入已經存在於 DataFrame 中的列,而沒有在 insert() 方法中設定 allow_duplicates = True,它就會向我們丟擲一個錯誤資訊:ValueError: cannot insert column, already exists.

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas DataFrame Column