How to Deduplicate a List in Python

  1. Deduplicate a Python List Without Preserving Order
  2. Deduplicate a Python List With Preserving Order

Sometimes in Python, we have a list of values, among which some are duplicates. It’s an everyday use case to remove all duplicates from the list, so that all remaining values in the list are unique.

We can achieve this using different methods, some of which do preserve the original order of elements, while others do not.

Deduplicate a Python List Without Preserving Order

If it’s not a requirement to preserve the original order, we can deduplicate a list using the built-in set data structure.

set is a data structure which may only contain unique element by design.

By constructing such set from our initial list, all duplicate elements are ignored. Then we may convert the set back into a list and will get a list of unique elements.

Unfortunately, the order of the elements changes, since deduplicating functionality of the set data structure is implemented using hash tables, which do not remember which elements where inserted first.

>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique_set = set(names)
>>> unique_list = list(unique_set)
>>> unique_list
['Stacy', 'Sarah', 'Jim', 'Bob']

If you use NumPy package for scientific computing in Python, you can also employ the numpy.unique() function.

>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim'] 
>>> numpy.unique(names).tolist()
['Bob', 'Jim', 'Sarah', 'Stacy']

Note that the above method doesn’t preserve the original element order either. The order-preserving NumPy way is more involved, and you can find it below.

Deduplicate a Python List With Preserving Order

A simple solution, which allows preserving the initial order, is to use a double for-each loop.

The first loop traverses all elements of the original list. The second loop checks if we have already seen an element with the same value.

If we haven’t, we add it to the unique list, which, in the end, will contain unique elements in the original order.

>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = []
>>> for name in names:         # 1st loop
...   if name not in unique:   # 2nd loop
...     unique.append(name)
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']

Another way to deduplicate a list while preserving the original order is to use the collections.OrderedDict data structure. OrderedDict is a special kind of a dictionary data structure in Python, that remembers the order of key insertion.

>>> from collections import OrderedDict
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = list(OrderedDict.fromkeys(names))
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']

If you use Pandas Python data analysis library, pandas.unique may be helpful as well. This method is order-preserving.

>>> import pandas
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> pandas.unique(names).tolist()
['Bob', 'Stacy', 'Sarah', 'Jim']

A NumPy’s way to deduplicate a list while preserving the order is a little more complicated. You have to remember an index of each distinct element and then recreate a unique list from the original one using such indexes.

>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> _, indexes = numpy.unique(names, return_index=True)
>>> unique = [names[i] for i in numpy.sort(indexes)]
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']

Related Article - Python List

  • What Is the Difference Between List Methods Append and Extend
  • What Is Difference Between Del, Remove and Pop on Python Lists
  • comments powered by Disqus