How to Get the Aggregate of Pandas Group-By and Sum
-
Cumulative Sum With
groupby -
pivot()to Rearrange the Data in a Nice Table -
Apply
functiontogroupbyin Pandas -
agg()to Get Aggregate Sum of the Column
We will demonstrate how to get the aggregate in Pandas by using groupby and sum. We will also look at the pivot functionality to arrange the data in a nice table and define our custom function and run it on the DataFrame. We will also get the aggregate sum by using agg().
Cumulative Sum With groupby
We can get the cumulative sum by using groupby method. Consider the following Dataframe with Date, Fruit and Sale columns:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
If we want to calculate the cumulative sum of Sale per Fruit and for every date, we can do:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(df.groupby(by=["Fruit", "Date"]).sum().groupby(level=[0]).cumsum())
Output:
Fruit Date Sale
Apple 08/09/2018 34
10/09/2018 46
Banana 08/09/2018 22
10/09/2018 49
pivot() to Rearrange the Data in a Nice Table
pivot() method could set row and column attribute of the table. Let’s change the above code and apply pivot() method to rearrange the data in a nice table:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(
df.groupby(["Fruit", "Date"], as_index=False).sum().pivot("Fruit", "Date").fillna(0)
)
Output:
Sale
Date 08/09/2018 10/09/2018
Fruit
Apple 34 12
Banana 22 27
Apply function to groupby in Pandas
We will create a simple method to get count of values in series or 1d array and use groupby to get aggregate count of each value:
from pandas import *
d = {"series": Series(["1", "2", "1", "1", "4", "4", "5"])}
df = DataFrame(d)
def get_count(values):
return len(values)
grouped_count = df.groupby("series").series.agg(get_count)
print(grouped_count)
After running the code, we will get the following output, which provides each value’s occurrence in the series.
Output:
series
1 3
2 1
4 2
5 1
Name: series, dtype: int64
agg() to Get Aggregate Sum of the Column
We can use agg() to apply sum operation on column. Example Codes:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(df.groupby(["Fruit"])["Sale"].agg("sum"))
Output:
Fruit
Apple 46
Banana 49
Name: Sale, dtype: int64