How to Group by and Sort in Pandas

Preet Sanghavi Feb 02, 2024

This tutorial explores the concept of grouping data of a data frame and sorting it in Pandas.

Group by and Sort DataFrame in Pandas

As we have learned, Pandas is an advanced data analysis tool or a package extension in Python. Most companies and organizations that use Python and require high-quality data analysis use this tool on a large scale.

This tutorial lets us understand how and why to group and sort certain data from a data frame in Pandas. Most businesses and organizations that use Python and Pandas for data analysis need to gather insights from their data to better plan their businesses.

Pandas help analysts with the `groupby` function to gather such insights. Consider, for example, a product-based company.

This company might need to group certain products and sort them in their sales order. Thus, grouping and sorting have many advantages in data analysis and interpretation.

Before we begin, we create a dummy data frame to work with. Here we create one data frame, namely `df`.

We add a few columns and certain data within this `df` data frame. We can do this operation using the following code.

``````import pandas as pd

df = pd.DataFrame({"dat1": [9, 5]})
df = pd.DataFrame(
{
"name": ["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"],
"count_1": [5, 10, 12, 15, 20, 25, 30, 35],
"count_2": [100, 150, 100, 25, 250, 300, 400, 500],
}
)
``````

The above code creates a data frame along with a few entries. To view the entries in the data, we use the following code.

``````print(df)
``````

The above code gives the following output.

``````	name	count_1	count_2
0	Foo		5		100
1	Foo		10		150
2	Baar	12		100
3	Foo		15		25
4	Baar	20		250
5	Foo		25		300
6	Baar	30		400
7	Baar	35		500
``````

As we can see, we have four columns and 8 rows indexed from value 0 to value 7. If we look into our data frame, we see certain names repeated, named `df`.

Since we have our data frame set up, let us group data within this data frame and then sort the values within those groupings.

Use the `groupby` Function to Group by and Sort DataFrame in Pandas

Let us group this data as we have set it up in place. We can group this data such that we have the names of similar products under the column `name` grouped up with each other to perform better data analysis.

We can do this operation in Pandas using the `groupby` function. This function ensures that the products or the values under the specified columns are brought together or grouped.

We can perform any extra operations on this grouped data. This grouping operation can be performed in Pandas, as illustrated below.

``````df.groupby(["name"])
``````

As we can see, we use the `groupby` function on our data frame named `df` with the column `name` passed as an argument.

Now let us sort our data with this `groupby` function such that we have not only the groupings but also the data sorted in a particular format.

We want to sort the data to have the three biggest values in our grouping after performing the `groupby` operation.

It means that we wish to fetch the three largest values after sorting the grouped data frame from our `df`. We can perform this operation using the following code.

``````print(df.groupby(["name"])["count_1"].nlargest(3))
``````

The code fetches the following results.

``````name
Baar  7    35
6    30
4    20
Foo   5    25
3    15
1    10
Name: count_1, dtype: int64
``````

As we can see, we have our groupings sorted in such a fashion that we have only the top three names with the highest counts as indicated within the `count_1` column.

Thus, for the name `Baar`, we can see that we have three entries for the count listed as `35`, `30`, and `20` and two entries for `Foo` with counts listed as `25`, `15`, and `10`.

In Pandas, we can also visualize the data type and the column’s name associated with that data type that has been grouped. In our case, we have the grouped column named `count_1` with the data type `int64` listed in our output at the bottom.

Thus, using the `groupby` function and the `nlargest()` function, we have grouped columns, sorted, and fetched certain records in our data frame.

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub