How to Iterate Through Columns of a Pandas DataFrame

Manav Narula Feb 02, 2024
  1. Use the getitem ([]) Syntax to Iterate Over Columns in Pandas DataFrame
  2. Use dataframe.iteritems() to Iterate Over Columns in Pandas Dataframe
  3. Use enumerate() to Iterate Over Columns Pandas
How to Iterate Through Columns of a Pandas DataFrame

DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to iterate over columns of a DataFrame and perform operations on columns individually like regression and many more.

We can use the for loop to iterate over columns of a DataFrame. The basic syntax of the for loop is given below:

for value in sequence:
    # Body of Loop

We can use multiple methods to run the for loop over a DataFrame, for example, the getitem syntax (the []), the dataframe.iteritems() function, the enumerate() function and using index of a DataFrame.

Use the getitem ([]) Syntax to Iterate Over Columns in Pandas DataFrame

We can use column-labels to run the for loop over the DataFrame using the getitem syntax([]). For example:

import pandas as pd

df = pd.DataFrame(
    [[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)

print(df)
print("------------------")
for column in df:
    print(df[column].values)

Output:

    a  b   c   d
0  10  6   7   8
1   1  9  12  14
2   5  8  10   6
------------------
[10  1  5]
[6 9 8]
[ 7 12 10]
[ 8 14  6]

The values() function is used to extract the object’s elements as a list.

Use dataframe.iteritems() to Iterate Over Columns in Pandas Dataframe

Pandas provides the dataframe.iteritems() function, which helps to iterate over a DataFrame and returns the column name and its content as series.

import pandas as pd

df = pd.DataFrame(
    [[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)

for (colname, colval) in df.iteritems():
    print(colname, colval.values)

Output:

a [10  1  5]
b [6 9 8]
c [ 7 12 10]
d [ 8 14  6]

Use enumerate() to Iterate Over Columns Pandas

The enumerate() with DataFrame returns the index and column-label, which allows us to iterate over it.

import pandas as pd

df = pd.DataFrame(
    [[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)

for (index, colname) in enumerate(df):
    print(index, df[colname].values)

Output:

0 [10  1  5]
1 [6 9 8]
2 [ 7 12 10]
3 [ 8 14  6]

We can very efficiently use any of the above methods to iterate over the DataFrame. We can also run operations like regressions over columns individually. For example, we can set the last column as the independent variable and run OLS regressions with other columns as dependent variables, as shown in the example below:

import pandas as pd
import statsmodels.api as sm
import numpy as np

df = pd.DataFrame(
    [[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)

for column in df:
    Y = df["d"]
    X = df[column]
    X = sm.add_constant(X)
    model = sm.OLS(X, Y)
    results = model.fit()
    print(results.params)

Output:

          0         1
d  0.094595  0.418919
          0     1
d  0.094595  0.75
          0         1
d  0.094595  0.959459
          0    1
d  0.094595  1.0
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas DataFrame