Introduction to Useful Rolling Functions for GroupBy Object in Pandas

Mehvish Ashiq Jan 30, 2023
  1. Pandas Rolling vs. Rolling Window
  2. Rolling Window Feature
  3. Syntax and Work Process of the dataframe.rolling() Function
  4. Useful Rolling Functions for GroupBy Object in Pandas
  5. Use the rolling().sum() Function for GroupBy Object in Pandas
  6. Use the rolling().mean() Function for GroupBy Object in Pandas
  7. Use the rolling().agg() Function on Multiple Columns for GroupBy Object in Pandas
Introduction to Useful Rolling Functions for GroupBy Object in Pandas

Today, we will explore the difference between Pandas rolling and rolling window features. We will learn about the rolling window feature, its syntax, and its working process, leading us to various code examples demonstrating different rolling functions for the group by an object in Pandas.

Pandas Rolling vs. Rolling Window

Python has different data-centric libraries/packages, and Pandas is one of them. We use various useful functions of this library, and one of them is known as the rolling() function.

The dataframe.rolling() function performs complex calculations on the provided data. It also has a feature called a rolling window which we will see with an example in a moment in this section.

The rolling window feature primarily works with time series and signal processing data. We use it to perform calculations on the provided input data in a given object series.

For instance, assume that w is the window size and t is the time, so we can take a window size w at the time t to apply desired mathematical operations on the data. The window size w means w consecutive values at a time t; remember, all w values are weighted equally at a time t.

Rolling Window Feature

The rolling window means performing calculations on the provided data from the specified date to a rolling window shift. For instance, every staff member is on the 1-month rolling window, meaning s/he will receive their salary on the 1st of every month of each year.

The first salary is on January 1st, the second is on February 1st, and the third is on March 1st. This process will continue until every staff member receives their 12th salary on December 1st; this process will be repeated yearly.

So, we can say that the rolling window feature is relative to the first specified date and automatically moves forward with a given rolling window time. In our scenario, it is a 1-month rolling window.

Note that the rolling window is fixed size while the expanding window only has a fixed starting point; it can expand to incorporate data when available. You may read the difference here.

Syntax and Work Process of the dataframe.rolling() Function

The dataframe.rolling() function provides elements of a rolling window count. The concept of dataframe.rolling() is the same as a general rolling window where a user only specifies the weighted window size (w) once and does some operations.

The syntax of dataframe.rolling() is as follows:

DataFrame.rolling(window, min_periods=None, freq=None, center=False, win_type=None, on=None, axis=0, closed=None)

It can take the following parameters.

Parameter Description
window It is the moving window’s size. It has several values/observations that are supposed to be used for calculations.
min_periods It shows the least number of values/observations in a window, which must have a value; otherwise, the output would be NA.
freq It is specified as a DateOffset object or string frequency which we use to confirm data before computing the statistic.
center The center parameter will set all the labels at the window’s center.
win_type It specifies the window type.
on It is used for a data frame column where we are supposed to determine a rolling window instead of an index.
closed It makes an interval closed on right, left, neither, or both endpoints.
axis By default, it is 0, but it can be an int or string.

Useful Rolling Functions for GroupBy Object in Pandas

We must have a data frame containing some sample data to learn the use of the rolling() function. We have a data frame as follows; you may also use the same.

Example Code:

import pandas as pd

n = range(0, 6)
id = ["a", "a", "a", "b", "b", "b"]
df = pd.DataFrame(zip(id, n), columns=["id", "n"])
df.set_index("id", inplace=True)
df

OUTPUT:

| id   | n    |
| ---- | ---- |
| a    | 0    |
| a    | 1    |
| a    | 2    |
| b    | 3    |
| b    | 4    |
| b    | 5    |

We imported the pandas library to work with the data frame in the above code snippet. Then, we used the range() function to get a sequence of numbers; by default, it starts from 0, ends before the specified number, and increments by 1.

We can change the default values of start, stop, and step parameters for range() function (if we want). Next, we have a list named id that we converted to a data frame using pd.DataFame(), which takes the data and a list of column names.

Here, the zip() function takes iterables that can be zero or more, map values from all the iterables, and return a single iterator object. The set_index() is used to make id as an index while the inplace attribute means changes will take effect within the data frame if it is set to True.

Use the rolling().sum() Function for GroupBy Object in Pandas

Example Code:

df_rolling_sum = df.groupby("id")["n"].rolling(2, min_periods=1).sum()
df_rolling_sum

OUTPUT:

| id   | id   |      |
| ---- | ---- | ---- |
| a    | a    | 0.0  |
|      | a    | 1.0  |
|      | a    | 3.0  |
| b    | b    | 3.0  |
|      | b    | 7.0  |
|      | b    | 9.0  |

Here, we used the groupby() function to create groups of particular values and perform operations on them. This function splits an object, applies the desired operations, and combines them to make a group.

The above line of codes do window length of 2 with min_periods=1 to perform sum on column n. See the following screenshot for understanding the output using rolling().sum().

introduction to useful rolling functions for groupby object in pandas - rolling sum

Use the rolling().mean() Function for GroupBy Object in Pandas

Example Code:

df_rolling_mean = df.groupby("id")["n"].rolling(2, min_periods=1).mean()
df_rolling_mean

OUTPUT:

| id   | id   |      |
| ---- | ---- | ---- |
| a    | a    | 0.0  |
|      | a    | 0.5  |
|      | a    | 1.5  |
| b    | b    | 3.0  |
|      | b    | 3.5  |
|      | b    | 4.5  |

This example is similar to the previous code fence where we were using rolling().sum() except for the difference of mean(). Here, we are calculating the mean of two values in a window computed by the sum divided by the number of values.

See the following image to understand.

introduction to useful rolling functions for groupby object in pandas - rolling mean

Remember, the first value of each group would be as it is in the output column because we don’t have any value before that, but if we omit the min_periods parameter, then we will have NaN. See the following example.

Example Code:

df_rolling_mean = df.groupby("id")["n"].rolling(2).mean()
df_rolling_mean

OUTPUT:

| id   | id   |      |
| ---- | ---- | ---- |
| a    | a    | NaN  |
|      | a    | 0.5  |
|      | a    | 1.5  |
| b    | b    | NaN  |
|      | b    | 3.5  |
|      | b    | 4.5  |

Use the rolling().agg() Function on Multiple Columns for GroupBy Object in Pandas

Example Code:

import pandas as pd

n1 = range(0, 6)
n2 = range(0, 6)
id = ["a", "a", "a", "b", "b", "b"]

df = pd.DataFrame(zip(id, n1, n1), columns=["id", "n1", "n2"])
df.set_index("id", inplace=True)

df_rolling_mean_sum = (
    df.groupby("id").rolling(2, min_periods=1).agg({"n1": "sum", "n2": "mean"})
)

print(df_rolling_mean_sum)

OUTPUT:

| id   | id   | n1   | n2   |
| ---- | ---- | ---- | ---- |
| a    | a    | 0.0  | 0.0  |
|      | a    | 1.0  | 0.5  |
|      | a    | 3.0  | 1.5  |
| b    | b    | 3.0  | 3.0  |
|      | b    | 7.0  | 3.5  |
|      | b    | 9.0  | 4.5  |

Here, we have two columns, n1 and n2, for which we use the agg() method to apply sum on n1 and mean on n2 (as given in the above output).

Use the DataFrame.cumsum() Function to Get the Rolling Sum for GroupBy Object in Pandas

Now, assume that we want to have a rolling sum as follows:

id n sum
a  0 0
a  1 1
a  2 3
b  3 3
b  4 7
b  5 12

Instead of the following:

id n sum
a  0 0.0
a  1 1.0
a  2 3.0
b  3 3.0
b  4 7.0
b  5 9.0

How can we do that? For that, we can use DataFrame.cumsum() as follows:

Example Code:

import pandas as pd

n = range(0, 6)
id = ["a", "a", "a", "b", "b", "b"]
df = pd.DataFrame(zip(id, n), columns=["id", "n"])
df.set_index("id", inplace=True)

df_cumsum = df.groupby("id").n.cumsum()
df_cumsum

OUTPUT:

| id   |      |
| ---- | ---- |
| a    | 0    |
| a    | 1    |
| a    | 3    |
| b    | 3    |
| b    | 7    |
| b    | 12   |

We can achieve the above output by using DataFrame.cumsum() method that returns a cumulative summation over the DataFrame or Series axis.

Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Related Article - Pandas GroupBy