How to Use of rolling().apply() on Pandas Dataframe and Series

Mehvish Ashiq Feb 02, 2024
  1. Use rolling().apply() on a Pandas DataFrame
  2. rolling.apply With Lambda
  3. Use rolling().apply() on a Pandas Series
How to Use of rolling().apply() on Pandas Dataframe and Series

Pandas library has many useful functions, rolling() is one of them, which can perform complex calculations on the specified datasets. We also have a method called apply() to apply the particular function/method with a rolling window to the complete data.

We can use rolling().apply() with Python series and data frames. This tutorial educates about rolling() and apply() methods, also demonstrates how to use rolling().apply() on a Pandas dataframe and series.

Use rolling().apply() on a Pandas DataFrame

Let’s dive in step-by-step to learn the use of rolling().apply() on a dataframe.

  • Import libraries.
    import pandas as pd
    import numpy as np
    

    First, we import necessary libraries, pandas for playing with data frames and numpy to work with arrays while using the numpy.median() function.

  • Create a dataframe.
    points_df = pd.DataFrame(
        {
            "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
            "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
            "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
            "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
        }
    )
    print(points_df)
    

    Output:

     Team_A Team_B Team_C Team_D
    0      12      13      14      15
    1      23      24      25      26
    2      34      35      36      37
    3      45      46      47      48
    4      32      33      34      35
    5      45      46      47      48
    6      32      33      34      35
    7      21      22      23      24
    8      33      34      35      36
    

    Next, create a dataframe named points_df, which contains different points for Team_A, Team_B, Team_C, and Team_D. We can see that the default index has no header (heading).

    Let’s create a heading for that in the following step.

  • Set the heading as index for the default column index.
    points_df.index.names = ["index"]
    print(points_df)
    

    Output:

    	 Team_A Team_B Team_C Team_D
    index
    0          12      13      14      15
    1          23      24      25      26
    2          34      35      36      37
    3          45      46      47      48
    4          32      33      34      35
    5          45      46      47      48
    6          32      33      34      35
    7          21      22      23      24
    8          33      34      35      36
    

    As we can see, the heading index is not aligned with Team_A, Team_B, Team_C, and Team_D. Let’s do it in the following step.

  • Align all headings for points_df dataframe.
    points_df.columns.name = points_df.index.name
    points_df.index.name = None
    print(points_df)
    

    Output:

    index Team_A Team_B Team_C Team_D
    0          12      13      14      15
    1          23      24      25      26
    2          34      35      36      37
    3          45      46      47      48
    4          32      33      34      35
    5          45      46      47      48
    6          32      33      34      35
    7          21      22      23      24
    8          33      34      35      36
    
  • Create the calculate_median() function.
    def calculate_median(n):
        return np.median(n)
    

    This function will take a series (we can say an array of numeric values) and return that series’s median.

  • Use rolling().apply() on the points_df dataframe.
    points_df = points_df.rolling(2).apply(calculate_median)
    print(points_df)
    

    Output:

    index Team_A Team_B Team_C Team_D
    0         NaN     NaN     NaN     NaN
    1        17.5    18.5    19.5    20.5
    2        28.5    29.5    30.5    31.5
    3        39.5    40.5    41.5    42.5
    4        38.5    39.5    40.5    41.5
    5        38.5    39.5    40.5    41.5
    6        38.5    39.5    40.5    41.5
    7        26.5    27.5    28.5    29.5
    8        27.0    28.0    29.0    30.0
    

    Here, the rolling() is used to serve rolling window computations. This idea (rolling window) is used in signal processes & time-series datasets.

    We have already written an article about rolling(), its syntax, the rolling window feature, and its working process by demonstrating various rolling functions. You can read that here.

    We use apply() function to apply a custom function (which is calculate_median() in our case) on the specified data.

  • Here is the complete source code.
    import pandas as pd
    import numpy as np
    
    points_df = pd.DataFrame(
        {
            "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
            "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
            "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
            "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
        }
    )
    
    points_df.index.names = ["index"]
    points_df.columns.name = points_df.index.name
    points_df.index.name = None
    
    print("Before rolling().apply():\n\n")
    print(points_df)
    
    
    def calculate_median(n):
        return np.median(n)
    
    
    points_df = points_df.rolling(2).apply(calculate_median)
    print("\n\nBefore rolling().apply():\n\n")
    print(points_df)
    

    Output:

    Before rolling().apply():
    
    
    index Team_A Team_B Team_C Team_D
    0          12      13      14      15
    1          23      24      25      26
    2          34      35      36      37
    3          45      46      47      48
    4          32      33      34      35
    5          45      46      47      48
    6          32      33      34      35
    7          21      22      23      24
    8          33      34      35      36
    
    
    Before rolling().apply():
    
    
    index Team_A Team_B Team_C Team_D
    0         NaN     NaN     NaN     NaN
    1        17.5    18.5    19.5    20.5
    2        28.5    29.5    30.5    31.5
    3        39.5    40.5    41.5    42.5
    4        38.5    39.5    40.5    41.5
    5        38.5    39.5    40.5    41.5
    6        38.5    39.5    40.5    41.5
    7        26.5    27.5    28.5    29.5
    8        27.0    28.0    29.0    30.0
    

rolling.apply With Lambda

Consider the following code:

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np


def test(df):
    return np.mean(df)


sc = StandardScaler()

tmp = pd.DataFrame(
    np.random.randn(2000, 2) / 10000,
    index=pd.date_range("2001-01-01", periods=2000),
    columns=["A", "B"],
)

print("Test 1: ")
print(tmp.rolling(window=5, center=False).apply(lambda x: test(x)))

print("SC_Fit: ")
print(
    tmp.rolling(window=5, center=False).apply(
        lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
    )
)

Output:

Test 1:
                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.000039  0.000053
...              ...       ...
2006-06-19  0.000022 -0.000021
2006-06-20  0.000005 -0.000027
2006-06-21  0.000024 -0.000060
2006-06-22  0.000023 -0.000038
2006-06-23  0.000014 -0.000017
[2000 rows x 2 columns]

SC_Fit:

                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.201991  0.349646
...              ...       ...
2006-06-19  1.035835 -0.688231
2006-06-20 -0.595888  1.057016
2006-06-21 -0.640150 -1.399535
2006-06-22 -0.535689  1.244345
2006-06-23  0.510958  0.614429

[2000 rows x 2 columns]

Since x in the lambda function represents a (rolling) series/ndarray, the function can be written as follows (where x[-1] refers to the current rolling data point).

lambda x: (x[-1] - x.mean()) / x.std(ddof=1)

Use rolling().apply() on a Pandas Series

Similarly, we can use rolling().apply() for a Pandas series. The following code fence is the same as we wrote for Pandas data frames except for one difference, we are using series here.

The complete source code is given below, but you can read about the series in detail here.

Example Code:

import pandas as pd
import numpy as np

points_series = pd.Series(
    [12, 23, 34, 45], index=["Team_A", "Team_B", "Team_C", "Team_D"]
)


print("Before rolling().apply():\n\n")
print(points_series)


def calculate_median(n):
    return np.median(n)


points_series = points_series.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_series)

Output:

Before rolling().apply():


Team_A    12
Team_B    23
Team_C    34
Team_D    45
dtype: int64


Before rolling().apply():


Team_A     NaN
Team_B    17.5
Team_C    28.5
Team_D    39.5
dtype: float64
Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook