How to Find Duplicates in a List in C#

Saad Aslam Feb 02, 2024
  1. Use the GroupBy and Where LINQ Methods to Find Duplicates in a List in C#
  2. Use HashSet to Find Duplicates in a List in C#
  3. Use Dictionary to Find Duplicates in a List in C#
  4. Find Duplicates in a List Using the List.Contains Method in C#
  5. Find Duplicates in a List Using the FindAll Method in C#
  6. Conclusion
How to Find Duplicates in a List in C#

Identifying duplicate entries within a list is a common task in C# programming. In this article, we will explore various methods to achieve this, showcasing different approaches and their implementations.

We will cover the usage of the GroupBy and Where LINQ methods, the HashSet method, Dictionary to track occurrences, Distinct and Except LINQ methods, and the FindAll method. Each method provides a unique perspective and solution to the common problem of finding duplicates in a list.

Use the GroupBy and Where LINQ Methods to Find Duplicates in a List in C#

LINQ (Language Integrated Query) provides a powerful set of tools to streamline the process of identifying duplicate entries in a list. One effective approach involves using the GroupBy and Where LINQ methods.

The GroupBy method is used to group elements in a collection based on a specified key. In the context of finding duplicates, we can use it to group elements in a list based on their values.

Following that, the Where method allows us to filter these groups based on a specified condition. By applying a condition that retains only those groups where the count is greater than one, we effectively isolate the duplicate entries.

Let’s explore a practical example using C#:

using System;
using System.Collections.Generic;
using System.Linq;

public class FindDuplicatesExample {
  public static void Main() {
    List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

    var duplicates =
        dataList.GroupBy(x => x).Where(group => group.Count() > 1).Select(group => group.Key);

    if (duplicates.Any()) {
      Console.WriteLine("The duplicate elements in the list are: " + string.Join(", ", duplicates));
    } else {
      Console.WriteLine("No duplicate elements in the list");
    }
  }
}

In this example, first, the necessary namespaces are imported, including System and System.Collections.Generic, providing access to fundamental functionalities and generic collections in C#. The System.Linq namespace is also included, enabling the use of LINQ methods for querying collections.

using System;
using System.Collections.Generic;
using System.Linq;

Next, we define a class named FindDuplicatesExample, encapsulating the functionality of our program. The Main method serves as the entry point of the program.

Inside the Main method, a List<string> named dataList is initialized and populated with string elements. This list includes some duplicate entries, which we aim to identify.

List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

The crucial part of the code lies in the LINQ query that follows. We use the GroupBy method to group elements in the dataList based on their values (x => x).

Each group contains elements with the same value. The subsequent Where clause filters out groups that have a count less than or equal to one, meaning it keeps only those groups that represent duplicate elements.

var duplicates =
    dataList.GroupBy(x => x).Where(group => group.Count() > 1).Select(group => group.Key);

The Select statement extracts the key of each group, which is the duplicate element itself. The result is a collection of duplicate elements stored in the duplicates variable.

Moving on, we have a conditional statement checking if there are any duplicates in the duplicates collection. If duplicates exist, the program prints a message indicating the duplicate elements, using string.Join to concatenate them with commas for a clean display.

if (duplicates.Any()) {
  Console.WriteLine("The duplicate elements in the list are: " + string.Join(", ", duplicates));
}

If no duplicates are found, the program outputs a message stating that there are no duplicate elements in the list.

When this program is executed with the provided list, the output will be:

The duplicate elements in the list are: Saad

This output signifies that the string Saad is a duplicate entry within the given list.

Use HashSet to Find Duplicates in a List in C#

Another effective method for identifying duplicate entries in a list involves leveraging the HashSet data structure. This approach is particularly beneficial when the goal is to prevent the collection from being populated with duplicate elements.

Compared to traditional list operations, HashSet offers significantly superior performance.

By definition, the HashSet is a collection type that only allows unique elements, making it an ideal choice for efficiently identifying and storing distinct values. In the context of finding duplicates, we can exploit the unique property of HashSet to isolate duplicate elements in a list.

Let’s delve into a practical example using C#:

using System;
using System.Collections.Generic;
using System.Linq;

public class FindDuplicatesExample {
  public static void Main() {
    List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

    HashSet<string> hashSet = new HashSet<string>();
    IEnumerable<string> duplicateElements = dataList.Where(e => !hashSet.Add(e));

    Console.WriteLine("The duplicate elements in the list are: " +
                      string.Join(", ", duplicateElements));
  }
}

The code begins by importing the necessary namespaces, similar to the previous example. The List<string> named dataList is initialized and populated with string elements, some of which are duplicates.

List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

The core of the code involves the creation of a HashSet<string> named hashSet to store unique elements and the LINQ query to identify duplicate elements.

The Where clause checks if an element can be added to the HashSet using the !hashSet.Add(e) condition. If an element cannot be added, it means it already exists in the HashSet, and thus, it is a duplicate.

HashSet<string> hashSet = new HashSet<string>();
IEnumerable<string> duplicateElements = dataList.Where(e => !hashSet.Add(e));

When executed with the provided list, the output will be:

The duplicate elements in the list are: Saad

This output indicates that the string Saad is a duplicate entry in the given list, highlighting the effectiveness of the HashSet method in identifying duplicates.

Use Dictionary to Find Duplicates in a List in C#

Another approach to efficiently identify duplicate entries in a list involves the use of a Dictionary to track occurrences. This method allows us to maintain a count of how many times each element appears in the list, making it straightforward to pinpoint duplicates.

A Dictionary in C# is a collection type that stores key-value pairs. In the context of finding duplicates, we can utilize a Dictionary where the elements of the list act as keys, and the corresponding values represent the count of occurrences.

By iterating through the list and updating the dictionary accordingly, we can identify elements with counts greater than one, signifying duplicates.

Let’s delve into a practical example using C#:

using System;
using System.Collections.Generic;

public class FindDuplicatesExample {
  public static void Main() {
    List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

    Dictionary<string, int> occurrences = new Dictionary<string, int>();
    List<string> duplicates = new List<string>();

    foreach (var item in dataList) {
      if (occurrences.ContainsKey(item)) {
        occurrences[item]++;
        if (occurrences[item] == 2) {
          duplicates.Add(item);
        }
      } else {
        occurrences.Add(item, 1);
      }
    }

    Console.WriteLine("The duplicate elements in the list are: " + string.Join(", ", duplicates));
  }
}

In this example, the List<string> named dataList is initialized and populated with the same string elements.

List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

A Dictionary<string, int> named occurrences is then created to track the count of occurrences of each element. A List<string> named duplicates is also created to store the duplicate elements.

Dictionary<string, int> occurrences = new Dictionary<string, int>();
List<string> duplicates = new List<string>();

The code then iterates through each item in the dataList, updating the occurrences dictionary accordingly.

  • If an item is already present in the dictionary, its count is incremented.
  • If the count reaches 2, the item is added to the duplicates list.
  • If the item is not present in the dictionary, it is added with an initial count of 1.
foreach (var item in dataList) {
  if (occurrences.ContainsKey(item)) {
    occurrences[item]++;
    if (occurrences[item] == 2) {
      duplicates.Add(item);
    }
  } else {
    occurrences.Add(item, 1);
  }
}

Finally, the program prints the duplicate elements to the console using string.Join for a clean display.

Output:

The duplicate elements in the list are: Saad

This output signifies that the string Saad is a duplicate entry within the given list.

Find Duplicates in a List Using the List.Contains Method in C#

Another approach to identifying and handling duplicates involves using the List.Contains method. This method provides a simple and straightforward way to check for the presence of an element within a list.

The List.Contains method is a member of the System.Collections.Generic namespace and is commonly used to determine whether a specific element is present in a list. The method returns a boolean value (true if the element is found, false otherwise).

Here’s its basic syntax:

bool result = myList.Contains(element);

In the context of finding duplicates, we can leverage this method by iterating through the list and checking for the existence of each element in a sublist of elements that come after it. If the element is found in the sublist, it implies a duplicate.

Let’s explore a practical example using C#:

using System;
using System.Collections.Generic;

class Program {
  static void Main() {
    List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

    List<string> duplicates = FindDuplicates(dataList);

    Console.WriteLine("Duplicates in the list: " + string.Join(", ", duplicates));
  }

  static List<string> FindDuplicates(List<string> list) {
    List<string> duplicates = new List<string>();

    for (int i = 0; i < list.Count; i++) {
      string item = list[i];
      if (list.IndexOf(item, i + 1) != -1 && !duplicates.Contains(item)) {
        duplicates.Add(item);
      }
    }

    return duplicates;
  }
}

In this example, the List<string> named dataList is initialized and populated with string elements, including duplicates that we aim to identify.

List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

The core of the code can be found in the FindDuplicates method, where we iterate through each element in the list (list) using a for loop. For each element, we use list.IndexOf(item, i + 1) to search for the same item in the sublist that starts from the next index (i + 1).

If the index is not -1 (indicating that the item was found in the sublist) and the item is not already in the duplicates list, we add it to the duplicates list.

Output:

Duplicates in the list: Saad

This output indicates that the string Saad is a duplicate entry within the given list. This approach ensures that each duplicate is only added once to the result list, preventing redundant entries.

Find Duplicates in a List Using the FindAll Method in C#

In C#, the FindAll method proves to be a straightforward and efficient way to identify duplicate entries in a list. This method belongs to the List class and allows us to retrieve all elements that match a specified condition.

In the context of finding duplicates, we can specify a condition that targets elements with counts greater than one, thereby isolating the duplicates.

Let’s explore a practical example using C#:

using System;
using System.Collections.Generic;

public class FindDuplicatesExample {
  public static void Main() {
    List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

    List<string> duplicates =
        dataList.FindAll(item => dataList.IndexOf(item) != dataList.LastIndexOf(item));

    Console.WriteLine("The duplicate elements in the list are: " + string.Join(", ", duplicates));
  }
}

In this example, we start by importing the necessary namespace for collections in C#.

using System.Collections.Generic;

The List<string> named dataList is initialized and populated with string elements, including duplicates that we aim to identify.

List<string> dataList = new List<string>() { "Saad", "John", "Miller", "Saad", "Stacey" };

Then, the FindAll method is used to retrieve elements that satisfy the condition specified within the provided lambda expression. In this case, the condition checks whether the index of an element is different from its last index in the list, effectively identifying elements that occur more than once.

List<string> duplicates =
    dataList.FindAll(item => dataList.IndexOf(item) != dataList.LastIndexOf(item));

Output:

The duplicate elements in the list are: Saad, Saad

This output signifies that the string Saad is a duplicate entry within the given list. The FindAll method offers a concise and readable approach to finding duplicates by specifying a condition that precisely captures the elements we are looking for.

Conclusion

We’ve explored various methods to find duplicates in a list in C#. Each method offers a unique perspective, and the choice of approach depends on factors such as performance requirements and coding preferences.

Whether leveraging LINQ methods, HashSet, Dictionary, or the FindAll method, these techniques provide efficient solutions to the common task of identifying and handling duplicate entries within a list.

Author: Saad Aslam
Saad Aslam avatar Saad Aslam avatar

I'm a Flutter application developer with 1 year of professional experience in the field. I've created applications for both, android and iOS using AWS and Firebase, as the backend. I've written articles relating to the theoretical and problem-solving aspects of C, C++, and C#. I'm currently enrolled in an undergraduate program for Information Technology.

LinkedIn

Related Article - Csharp List