How to Select Multiple Columns in Pandas Dataframe

Manav Narula Feb 02, 2024
  1. Use __getitem__ Syntax ([]) to Select Multiple Columns
  2. Use iloc() and loc() Methods to Select Multiple Columns in Pandas
How to Select Multiple Columns in Pandas Dataframe

We may face problems when extracting data of multiple columns from a Pandas DataFrame, mainly because they treat the Dataframe like a 2-dimensional array. To select multiple columns from a DataFrame, we can use either the basic indexing method by passing column names list to the getitem syntax ([]), or iloc() and loc() methods provided by Pandas library. For this tutorial, we will select multiple columns from the following DataFrame.

Example DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])

print(df)

Output:

          a         b         c         d
0  0.255086  0.282203  0.342223  0.263599
1  0.744271  0.591687  0.861554  0.871859
2  0.420066  0.713664  0.770193  0.207427
3  0.014447  0.352515  0.535801  0.119759

Use __getitem__ Syntax ([]) to Select Multiple Columns

By storing the names of the columns to be extracted in a list and then passing it to the [], we can select multiple columns from the DataFrame. The following code will explain how we can select columns a and c from the previously shown DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])

print(df[["a", "c"]])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801

Use iloc() and loc() Methods to Select Multiple Columns in Pandas

We can also use the iloc() and loc() methods to select multiple columns.

When we want to use the column indexes to extract them, we can use iloc() as shown in the below example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])
print(df.iloc[:, [0, 2]])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801

Similarly, we can use loc() when we want to select columns using their names as shown below:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])

print(df.loc[:, ["a", "c"]])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas DataFrame