How to Get Pandas Unique Values in Column and Sort Them

Manav Narula Feb 02, 2024
  1. Get Unique Values in Pandas DataFrame Column With unique Method
  2. Get Unique Values in Pandas DataFrame Column With drop_duplicates Method
  3. Sort a Column in Pandas DataFrame
How to Get Pandas Unique Values in Column and Sort Them

This article will introduce how to get unique values in the Pandas DataFrame column.

For example, suppose we have a DataFrame consisting of individuals and their professions, and we want to know the total number of professions. In that case, we cannot simply use the total row-count to determine the total unique professions because many people can have the same job. For such situations, we can use the unique() and drop_duplicates() functions provided by Pandas library.

It’s also important to know how to sort your DataFrame since it can help visualize and understand the data. sorted() and sort_values() functions can help achieve this.

We will sort and remove the following DataFrame in this tutorial.

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})

print(df)

Output:

   A  B
0  7  1
1  1  2
2  5  8
3  4  5
4  2  3
5  1  4
6  4  2
7  4  6
8  8  8

Get Unique Values in Pandas DataFrame Column With unique Method

Pandas Series’ unique() method is used when we deal with a single column of a DataFrame and returns all unique elements of a column. The final output using the unique() function is an array.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})

print(df["A"].unique())
print(type(df["A"].unique()))

Output:

[7 1 5 4 2 8]
numpy.ndarray

Get Unique Values in Pandas DataFrame Column With drop_duplicates Method

drop_duplicates() can be applied to the DataFrame or its subset and preserves the type of the DataFrame object. It is also considered a faster option when dealing with huge data sets to remove duplicate values.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})

print(df.drop_duplicates(subset="A"))
print(type(df.drop_duplicates(subset="A")))

Output:

   A  B
0  7  1
1  1  2
2  5  8
3  4  5
4  2  3
8  8  8
pandas.core.frame.DataFrame

Sort a Column in Pandas DataFrame

We can use the sorted() method to sort a column, but it converts the final result to a list type object. We can also sort the column values in descending order by putting the reversed parameter as True.

The following example sorts the column in ascending order and removes the duplicate values:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})

df_new = df.drop_duplicates(subset="A")

print(sorted(df_new["A"]))
print(type(sorted(df_new["A"])))

Output:

[1, 2, 4, 5, 7, 8]
list

sort_values() is another flexible option to sort a DataFrame. Here we can specify the column to be sorted using the by parameter and whether the order is ascending or descending using the ascending parameter. It preserves the object type as Pandas DataFrame.

The following example sorts the column in descending order and removes the duplicate values:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [7, 1, 5, 4, 2, 1, 4, 4, 8], "B": [1, 2, 8, 5, 3, 4, 2, 6, 8]})

df_new = df.drop_duplicates(subset="A")

print(df_new.sort_values(by="A", ascending=False))
type(df_new.sort_values(by="A"))

Output:

   A  B
8  8  8
0  7  1
2  5  8
3  4  5
4  2  3
1  1  2
pandas.core.frame.DataFrame
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas DataFrame