Pandas DataFrame DataFrame.sum() Function

  1. Syntax of pandas.DataFrame.sum():
  2. Return
  3. Example Codes: DataFrame.sum() Method to Calculate Sum Along Column Axis
  4. Example Codes: DataFrame.sum() Method to Find Sum Along Row Axis
  5. Example Codes: DataFrame.sum() Method to Find the Sum Ignoring NaN Values
  6. Example Codes: Set min_count in DataFrame.sum() Method

The function of the Python Pandas DataFrame.sum() is to calculate the sum of values of DataFrame object over the specified axis.

Syntax of pandas.DataFrame.sum():

DataFrame.sum(axis=None, 
              skipna=None, 
              level=None, 
              numeric_only=None,
              min_count=0, 
              **kwargs)

Parameters

axis find sum along the row (axis=0) or column (axis=1)
skipna Boolean. Exclude NaN values (skipna=True) or include NaN values (skipna=False)
level Count along with particular level if the axis is MultiIndex
numeric_only Boolean. For numeric_only=True, include only float,int, and boolean columns
min_count Integer. Minimum number of non-NaN values to calculate the sum. If this condition is not satisfied, the sum will be NaN
**kwargs Additional keyword arguments to the function.

Return

If the level is not specified, return Series of the sum of the values for the requested axis, else return DataFrame of sum values.

Example Codes: DataFrame.sum() Method to Calculate Sum Along Column Axis

import pandas as pd

df = pd.DataFrame({'X': 
                   [1,2,3,4,5], 
                   'Y': [1, 2, 3,4,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df.sum()
print("Column-wise Sum:")
print(sums)

Output:

DataFrame:
   X  Y  Z
0  1  1  3
1  2  2  4
2  3  3  5
3  4  4  6
4  5  5  3
Column-wise Sum:
X    15
Y    15
Z    21
dtype: int64

It calculates the sum for all the columns X, Y, and Z and finally returns a Series object with the sum of each column.

To find the sum of a particular column of DataFrame in Pandas, you need to call the sum() function for that column only.

import pandas as pd

df = pd.DataFrame({'X': 
                   [1,2,3,4,5], 
                   'Y': [1, 2, 3,4,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df["Z"].sum()
print("Sum of values of Z-column:")
print(sums)

Output:

DataFrame:
   X  Y  Z
0  1  1  3
1  2  2  4
2  3  3  5
3  4  4  6
4  5  5  3
Sum of values of Z-column:
21

It only gives the sum of values of column Z of DataFrame.

Example Codes: DataFrame.sum() Method to Find Sum Along Row Axis

import pandas as pd

df = pd.DataFrame({'X': 
                   [1,2,3,4,5], 
                   'Y': [1, 2, 3,4,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df.sum(axis=1)
print("Row-wise sum:")
print(sums)

Output:

DataFrame:
   X  Y  Z
0  1  1  3
1  2  2  4
2  3  3  5
3  4  4  6
4  5  5  3
Row-wise sum:
0     5
1     8
2    11
3    14
4    13
dtype: int64

It calculates the sum for all the rows and finally returns a Series object with the sum of each row.

To find the sum of a particular row of DataFrame in Pandas, you need to call the sum() function for that specific row only.


import pandas as pd

df = pd.DataFrame({'X': 
                   [1,2,3,4,5], 
                   'Y': [1, 2, 3,4,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sum_3=df.iloc[[2]].sum(axis=1)
print("Sum of values of 3rd Row:")
print(sum_3)

Output:

DataFrame:
   X  Y  Z
0  1  1  3
1  2  2  4
2  3  3  5
3  4  4  6
4  5  5  3
Sum of values of 3rd Row:
2    11
dtype: int64

It only gives the sum of values of the 3rd row of DataFrame.

Use the iloc method to select rows based on the index.

Example Codes: DataFrame.sum() Method to Find the Sum Ignoring NaN Values

Use the default value of skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values.

import pandas as pd
df = pd.DataFrame({'X': 
                   [1,None,3,4,5], 
                   'Y': [1, None, 3,None,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df.sum()
print("Column-wise Sum:")
print(sums)

Output:

DataFrame:
     X    Y  Z
0  1.0  1.0  3
1  NaN  NaN  4
2  3.0  3.0  5
3  4.0  NaN  6
4  5.0  5.0  3
Column-wise Sum:
X    13.0
Y     9.0
Z    21.0
dtype: float64

If you set skipna=True, you’ll get NaN values of sums if the DataFrame has NaN values.


import pandas as pd

df = pd.DataFrame({'X': 
                   [1,None,3,4,5], 
                   'Y': [1, None, 3,None,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df.sum(skipna=False)
print("Column-wise Sum:")
print(sums)

Output:

DataFrame:
     X    Y  Z
0  1.0  1.0  3
1  NaN  NaN  4
2  3.0  3.0  5
3  4.0  NaN  6
4  5.0  5.0  3
Column-wise Sum:
X     NaN
Y     NaN
Z    21.0
dtype: float64

Here, you get the NaN value for the sum of columns X and Y as both of them have the NaN values in them.

Example Codes: Set min_count in DataFrame.sum() Method

import pandas as pd

df = pd.DataFrame({'X': 
                   [1,None,3,4,5], 
                   'Y': [1, None, 3,None,5], 
                   'Z': [3,4,5,6,3]})
print("DataFrame:")
print(df)

sums=df.sum(min_count=4)
print("Column-wise Sum:")
print(sums)

Output:

DataFrame:
     X    Y  Z
0  1.0  1.0  3
1  NaN  NaN  4
2  3.0  3.0  5
3  4.0  NaN  6
4  5.0  5.0  3
Column-wise Sum:
X    13.0
Y     NaN
Z    21.0
dtype: float64

Here, you get the NaN value for the sum of column Y as column Y has only 3 non-NaN values, which is less than the value of the min_count parameter.

comments powered by Disqus