Pandas DataFrame DataFrame.sort_values() Function

  1. Syntax of pandas.DataFrame.sort_values():
  2. Example Codes: Sort DataFrame With Pandas pandas.DataFrame.sort_values() Based on a Single Column
  3. Example Codes: Sort DataFrame With Pandas DataFrame.sort_values() Based on Multiple Columns
  4. Example Codes: Sort DataFrame in Descending Order With Pandas DataFrame.sort_values()
  5. Example Codes: Sort DataFrame by Putting NaN First With Pandas DataFrame.sort_values()

Pandas DataFrame.sort_values() method sorts the caller DataFrame in the ascending or descending order by values in the specified column along either index.

Syntax of pandas.DataFrame.sort_values():

DataFrame.sort_values(by, 
                      axis=0, 
                      ascending=True, 
                      inplace=False, 
                      kind='quicksort', 
                      na_position='last', 
                      ignore_index=False)

Parameters

by Name or list of names to sort by
axis sort along the row (axis=0) or column (axis=1)
ascending sort in ascending order (ascending=True) or descending order (ascending=False)
inplace Boolean. If True, modify the caller DataFrame in-place
kind which sorting algorithm to use. default:quicksort
na_position Put NaN value at the beginning (na_position=first) or the end (na_position=last)
ignore_index Boolean. If Ture, the indexes from the original DataFrame is ignored. The default value is False which means the indexes are used.
New in version 1.0.0

Return

If inplace is True, it returns the sorted DataFrame; otherwise None.

Example Codes: Sort DataFrame With Pandas pandas.DataFrame.sort_values() Based on a Single Column

import pandas as pd

dates=['April-10', 
       'April-11', 
       'April-12', 
       'April-13',
       'April-14',
       'April-16']
sales=[200,300,400,200,300,300]
prices=[3, 1, 2, 4,3,2]

df = pd.DataFrame({'Date':dates ,
                   'Sales':sales ,
                   'Price': prices})
print("Before Sorting:")
print(df)
sorted_df=df.sort_values(by=['Price'])
print("After Sorting:")
print(sorted_df)

Output:

Before Sorting:
       Date  Sales  Price
0  April-10    200      3
1  April-11    300      1
2  April-12    400      2
3  April-13    200      4
4  April-14    300      3
5  April-16    300      2
After Sorting:
       Date  Sales  Price
       Date  Sales  Price
1  April-11    300      1
2  April-12    400      2
5  April-16    300      2
0  April-10    200      3
4  April-14    300      3
3  April-13    200      4

It sorts the DataFrame df in the ascending order (default) by values in the column Price.

The indexes in the sorted DataFrame keeps the same as in the original DataFrame.

If you prefer to have the new index column in the sorted DataFrame, then you could set ignore_index (introduced from version 1.0.0) to be True.

import pandas as pd

dates=['April-10', 
       'April-11', 
       'April-12', 
       'April-13',
       'April-14',
       'April-16']
sales=[200,300,400,200,300,300]
prices=[3, 1, 2, 4,3,2]

df = pd.DataFrame({'Date':dates ,
                   'Sales':sales ,
                   'Price': prices})
print("Before Sorting:")
print(df)
sorted_df=df.sort_values(by=['Price'],
                        ignore_index=True)
print("After Sorting:")

Output:

Before Sorting:
       Date  Sales  Price
0  April-10    200      3
1  April-11    300      1
2  April-12    400      2
3  April-13    200      4
4  April-14    300      3
5  April-16    300      2
After Sorting:
       Date  Sales  Price
0  April-11    300      1
1  April-12    400      2
2  April-16    300      2
3  April-10    200      3
4  April-14    300      3
5  April-13    200      4

Here, we use ignore_index=True to assign new indexes to rows and ignore the index of the original DataFrame.

Example Codes: Sort DataFrame With Pandas DataFrame.sort_values() Based on Multiple Columns

import pandas as pd

dates=['April-10', 
       'April-11', 
       'April-12', 
       'April-13',
       'April-14',
       'April-16']
sales=[200,300,400,200,300,300]
prices=[3, 1, 2, 4,3,2]

df = pd.DataFrame({'Date':dates ,
                   'Sales':sales ,
                   'Price': prices})
print("Before Sorting:")
print(df)
df.sort_values(by=['Sales','Price'],
               ignore_index=True,
               inplace=True)
print("After Sorting:")
print(df)

Output:

Before Sorting:
       Date  Sales  Price
0  April-10    200      3
1  April-11    300      1
2  April-12    400      2
3  April-13    200      4
4  April-14    300      3
5  April-16    300      2
After Sorting:
       Date  Sales  Price
0  April-10    200      3
1  April-13    200      4
2  April-11    300      1
3  April-16    300      2
4  April-14    300      3
5  April-12    400      2

Here, at first, Sales is sorted firstly in the ascending order, and then Price for each Sales is also sorted in the ascending order.

In the df, 200 is the smallest value of the Sales column and 3 is the smallest value of the Price column for Sales value of 200.

So, the row with 200 in the Sales column and 3 in the Price goes to the top.

Due to inplace=True, the original DataFrame is modified after calling sort_values() function.

Example Codes: Sort DataFrame in Descending Order With Pandas DataFrame.sort_values()

import pandas as pd

dates=['April-10', 
       'April-11', 
       'April-12', 
       'April-13',
       'April-14',
       'April-16']
sales=[200,300,400,200,300,300]
prices=[3, 1, 2, 4,3,2]

df = pd.DataFrame({'Date':dates ,
                   'Sales':sales ,
                   'Price': prices})
print("Before Sorting:")
print(df)
sorted_df=df.sort_values(by=['Sales'],
                         ignore_index=True,
                         ascending=False)
print("After Sorting:")
print(sorted_df)

Output:


Before Sorting:
       Date  Sales  Price
0  April-10    200      3
1  April-11    300      1
2  April-12    400      2
3  April-13    200      4
4  April-14    300      3
5  April-16    300      2
After Sorting:
       Date  Sales  Price
0  April-12    400      2
1  April-11    300      1
2  April-14    300      3
3  April-16    300      2
4  April-10    200      3
5  April-13    200      4

It sorts the DataFrame df in the descending order of values of column Sales.

400 is the largest value in the Sales column; hence the entry goes to the top, and other rows are sorted accordingly.

Example Codes: Sort DataFrame by Putting NaN First With Pandas DataFrame.sort_values()

import pandas as pd

dates=['April-10', 
       'April-11', 
       'April-12', 
       'April-13',
       'April-14',
       'April-16']
sales=[200,300,400,200,300,300]
prices=[3, 1, 2, 4,3,2]

df = pd.DataFrame({'Date':dates ,
                   'Sales':sales ,
                   'Price': prices})
print("Before Sorting:")
print(df)
sorted_df=df.sort_values(by=['Price'],ignore_index=True,na_position='first')
print("After Sorting:")
print(sorted_df)

Output:

Before Sorting:
       Date  Sales  Price
0  April-10    200    NaN
1  April-11    300    1.0
2  April-12    400    2.0
3  April-13    200    4.0
4  April-14    300    3.0
5  April-16    300    NaN
After Sorting:
       Date  Sales  Price
0  April-10    200    NaN
1  April-16    300    NaN
2  April-11    300    1.0
3  April-12    400    2.0
4  April-14    300    3.0
5  April-13    200    4.0

By default, NaN values are placed at the end of DataFrame after sorting.

But by setting na_position=first, we can place the NaN values at the beginning of DataFrame.

Related Article - Pandas DataFrame

  • Pandas DataFrame DataFrame.where() Function
  • Pandas DataFrame DataFrame.sample() Function
  • comments powered by Disqus