How to Group by and Sort in Pandas

Preet Sanghavi Feb 02, 2024
  1. Group by and Sort DataFrame in Pandas
  2. Use the groupby Function to Group by and Sort DataFrame in Pandas
How to Group by and Sort in Pandas

This tutorial explores the concept of grouping data of a data frame and sorting it in Pandas.

Group by and Sort DataFrame in Pandas

As we have learned, Pandas is an advanced data analysis tool or a package extension in Python. Most companies and organizations that use Python and require high-quality data analysis use this tool on a large scale.

This tutorial lets us understand how and why to group and sort certain data from a data frame in Pandas. Most businesses and organizations that use Python and Pandas for data analysis need to gather insights from their data to better plan their businesses.

Pandas help analysts with the groupby function to gather such insights. Consider, for example, a product-based company.

This company might need to group certain products and sort them in their sales order. Thus, grouping and sorting have many advantages in data analysis and interpretation.

Before we begin, we create a dummy data frame to work with. Here we create one data frame, namely df.

We add a few columns and certain data within this df data frame. We can do this operation using the following code.

import pandas as pd

df = pd.DataFrame({"dat1": [9, 5]})
df = pd.DataFrame(
    {
        "name": ["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"],
        "count_1": [5, 10, 12, 15, 20, 25, 30, 35],
        "count_2": [100, 150, 100, 25, 250, 300, 400, 500],
    }
)

The above code creates a data frame along with a few entries. To view the entries in the data, we use the following code.

print(df)

The above code gives the following output.

	name	count_1	count_2
0	Foo		5		100
1	Foo		10		150
2	Baar	12		100
3	Foo		15		25
4	Baar	20		250
5	Foo		25		300
6	Baar	30		400
7	Baar	35		500

As we can see, we have four columns and 8 rows indexed from value 0 to value 7. If we look into our data frame, we see certain names repeated, named df.

Since we have our data frame set up, let us group data within this data frame and then sort the values within those groupings.

Use the groupby Function to Group by and Sort DataFrame in Pandas

Let us group this data as we have set it up in place. We can group this data such that we have the names of similar products under the column name grouped up with each other to perform better data analysis.

We can do this operation in Pandas using the groupby function. This function ensures that the products or the values under the specified columns are brought together or grouped.

We can perform any extra operations on this grouped data. This grouping operation can be performed in Pandas, as illustrated below.

df.groupby(["name"])

As we can see, we use the groupby function on our data frame named df with the column name passed as an argument.

Now let us sort our data with this groupby function such that we have not only the groupings but also the data sorted in a particular format.

We want to sort the data to have the three biggest values in our grouping after performing the groupby operation.

It means that we wish to fetch the three largest values after sorting the grouped data frame from our df. We can perform this operation using the following code.

print(df.groupby(["name"])["count_1"].nlargest(3))

The code fetches the following results.

name
Baar  7    35
      6    30
      4    20
Foo   5    25
      3    15
      1    10
Name: count_1, dtype: int64

As we can see, we have our groupings sorted in such a fashion that we have only the top three names with the highest counts as indicated within the count_1 column.

Thus, for the name Baar, we can see that we have three entries for the count listed as 35, 30, and 20 and two entries for Foo with counts listed as 25, 15, and 10.

In Pandas, we can also visualize the data type and the column’s name associated with that data type that has been grouped. In our case, we have the grouped column named count_1 with the data type int64 listed in our output at the bottom.

Thus, using the groupby function and the nlargest() function, we have grouped columns, sorted, and fetched certain records in our data frame.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas Groupby