Select Multiple Columns in Pandas Dataframe

  1. Use __getitem__ Syntax ([]) to Select Multiple Columns
  2. Use iloc() and Loc() Methods to Select Multiple Columns in Pandas

We may face problems when extracting data of multiple columns from a Pandas DataFrame, mainly because they treat the Dataframe like a 2-dimensional array. To select multiple columns from a DataFrame, we can use either the basic indexing method by passing column names list to the getitem syntax ([]), or iloc() and loc() methods provided by Pandas library. For this tutorial, we will select multiple columns from the following DataFrame.

Example DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4,4), columns = ['a','b','c','d'])

print(df)

Output:

          a         b         c         d
0  0.255086  0.282203  0.342223  0.263599
1  0.744271  0.591687  0.861554  0.871859
2  0.420066  0.713664  0.770193  0.207427
3  0.014447  0.352515  0.535801  0.119759

Use __getitem__ Syntax ([]) to Select Multiple Columns

By storing the names of the columns to be extracted in a list and then passing it to the [], we can select multiple columns from the DataFrame. The following code will explain how we can select columns a and c from the previously shown DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4,4), columns = ['a','b','c','d'])

print(df[['a','c']])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801

Use iloc() and Loc() Methods to Select Multiple Columns in Pandas

We can also use the iloc() and loc() methods to select multiple columns.

When we want to use the column indexes to extract them, we can use iloc() as shown in the below example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4,4), columns = ['a','b','c','d'])
print(df.iloc[:,[0,2]])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801

Similarly, we can use loc() when we want to select columns using their names as shown below:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(4,4), columns = ['a','b','c','d'])

print(df.loc[:,['a','c']])

Output:

          a         c
0  0.255086  0.342223
1  0.744271  0.861554
2  0.420066  0.770193
3  0.014447  0.535801
Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Pandas DataFrame

  • Take Column-Slices of DataFrame in Pandas
  • Get Pandas DataFrame Column Headers as a List