# Deduplicate a List in Python

Sometimes in Python, we have a list of values, among which some are duplicates. It’s an everyday use case to remove all duplicates from the list, so that all remaining values in the list are unique.

We can achieve this using different methods, some of which do preserve the original order of elements, while others do not.

## Deduplicate a Python List Without Preserving Order

If it’s not a requirement to preserve the original order, we can deduplicate a list using the built-in `set`

data structure.

`set`

is a data structure which may only contain unique element by design.

By constructing such `set`

from our initial list, all duplicate elements are ignored. Then we may convert the set back into a list and will get a list of unique elements.

Unfortunately, the order of the elements changes, since deduplicating
functionality of the `set`

data structure is implemented using hash tables, which
do not remember which elements where inserted first.

```
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique_set = set(names)
>>> unique_list = list(unique_set)
>>> unique_list
['Stacy', 'Sarah', 'Jim', 'Bob']
```

If you use NumPy package for scientific computing in Python, you can also use the `numpy.unique()`

function.

```
>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> numpy.unique(names).tolist()
['Bob', 'Jim', 'Sarah', 'Stacy']
```

Note that the above method doesn’t preserve the original element order either. The order-preserving NumPy way is more involved, and you can find it below.

## Deduplicate a Python List With Preserving Order

A simple solution, which allows preserving the initial order, is to use a double for-each loop.

The first loop traverses all elements of the original list. The second loop checks if we have already seen an element with the same value.

If we haven’t, we add it to the `unique`

list, which, in the end,
will contain unique elements in the original order.

```
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = []
>>> for name in names: # 1st loop
... if name not in unique: # 2nd loop
... unique.append(name)
...
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
```

Another way to deduplicate a list while preserving the original order is to use the `collections.OrderedDict`

data structure. `OrderedDict`

is a special kind of a dictionary data structure in Python, that remembers the order of key insertion.

```
>>> from collections import OrderedDict
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = list(OrderedDict.fromkeys(names))
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
```

If you use Pandas Python data analysis library, `pandas.unique`

may be helpful as well. This method is order-preserving.

```
>>> import pandas
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> pandas.unique(names).tolist()
['Bob', 'Stacy', 'Sarah', 'Jim']
```

A NumPy’s way to deduplicate a list while preserving the order is a little more complicated. You have to remember an index of each distinct element and then recreate a unique list from the original one using such indexes.

```
>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> _, indexes = numpy.unique(names, return_index=True)
>>> unique = [names[i] for i in numpy.sort(indexes)]
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
```