How to Find Duplicate Records in a MySQL Table

Victor A. Oguntuase Feb 02, 2024
  1. Use the SELECT DISTINCT Statement to Check for Duplicates in a MySQL Table
  2. Use the SELECT COUNT Statement to Find the Number of Occurrence of Duplicates in a MySQL Table
  3. Use the INNER JOIN Statement to View Duplicate Records in a MySQL Table
How to Find Duplicate Records in a MySQL Table

This tutorial explores how to check for duplicates in a MySQL table and retrieve the number of duplicates with the SELECT DISTINCT and SELECT COUNT statements, respectively.

Use the SELECT DISTINCT Statement to Check for Duplicates in a MySQL Table

The impact of having duplicate records in a database table can vary from a minor inconvenience to disaster. Luckily, MySQL has a few nifty keywords that can combine to scan a table for duplicates.

Also, we can count the number of occurrences of duplicate records and delete them where necessary.

Let us create a test_company database with a sample table called employee_details.

CREATE DATABASE test_company;
USE test_company;

CREATE TABLE employees_details(
    id INT AUTO_INCREMENT,
    name VARCHAR (255) NOT NULL,
    title VARCHAR(255) NOT NULL,
    salary INT,

    PRIMARY KEY (id)
);

The table is populated with values, including duplicates.

INSERT INTO employees_details (name, title, salary) Values
('James Maddison','Computer Engineer',80000),
('Matthew Defoe','Software Architect',150000),
('Daniel Jameson','Software Engineer II', 95000),
('Jules Reddington','Senior Software Engineer',120000),
('Carlos Rodriguez','Data Engineer',100000),
('Matthew Defoe','Software Architect',150000),
('Daniel Jameson','Software Engineer II', 95000),
('Jules Reddington','Senior Software Engineer',120000);

SELECT * FROM employees_details;

Output:

+----+------------------+--------------------------+--------+
| id | name             | title                    | salary |
+----+------------------+--------------------------+--------+
|  1 | James Maddison   | Computer Engineer        |  80000 |
|  2 | Matthew Defoe    | Software Architect       | 150000 |
|  3 | Daniel Jameson   | Software Engineer II     |  95000 |
|  4 | Jules Reddington | Senior Software Engineer | 120000 |
|  5 | Carlos Rodriguez | Data Engineer            | 100000 |
|  6 | Matthew Defoe    | Software Architect       | 150000 |
|  7 | Daniel Jameson   | Software Engineer II     |  95000 |
|  8 | Jules Reddington | Senior Software Engineer | 120000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.00 sec)

There are three duplicates in this table, and this is easy to spot because of the small size of the table. We use the SELECT DISTINCT statement for larger tables to retrieve unique records from the table.

According to the official documentation, the SELECT DISTINCT statement retrieves only one instance of an output record, even if it occurs multiple times.

Hence, we can be sure there are duplicates when the number of records returned by the SELECT DISTINCT statement is less than the total records in the table.

-- Retrieving only distinct records.
SELECT DISTINCT name,title,salary FROM employees_details;

Output:

+------------------+--------------------------+--------+
| name             | title                    | salary |
+------------------+--------------------------+--------+
| James Maddison   | Computer Engineer        |  80000 |
| Matthew Defoe    | Software Architect       | 150000 |
| Daniel Jameson   | Software Engineer II     |  95000 |
| Jules Reddington | Senior Software Engineer | 120000 |
| Carlos Rodriguez | Data Engineer            | 100000 |
+------------------+--------------------------+--------+
5 rows in set (0.00 sec)

Notice that the id column is not included in the query because MySQL identifies the id column as unique records.

Hence, including the id column in the query will return all records (including the duplicates) as unique.

From the result-set, we can deduce that there are three (3) duplicate records since the query returns five (5) distinct records and there are eight (8) records in the table.

Use the SELECT COUNT Statement to Find the Number of Occurrence of Duplicates in a MySQL Table

Now, after detecting the existence of duplicates, we can use the SELECT COUNT statement to find the number of occurrences of the duplicates.

SELECT name AS 'employee name', COUNT(*) AS Occurrence FROM employees_details
GROUP BY name
HAVING Occurrence > 1;

Output:

+------------------+------------+
| employee name    | Occurrence |
+------------------+------------+
| Daniel Jameson   |          2 |
| Jules Reddington |          2 |
| Matthew Defoe    |          2 |
+------------------+------------+
3 rows in set (0.001 sec)

This retrieves the duplicated records and the number of duplicates per record in the table. As expected, there are three (3) duplicated records.

The use of the SELECT COUNT statement is further discussed via this official reference.

Use the INNER JOIN Statement to View Duplicate Records in a MySQL Table

We can use an INNER JOIN query between the target table and the SELECT DISTINCT query to view the duplicate records alongside the main record.

SELECT I.id, O.name, O.title, O.salary
FROM employees_details AS I
INNER JOIN(SELECT DISTINCT name, title, salary FROM employees_details) AS O
ON I.name = O.name;

Output:

+----+------------------+--------------------------+--------+
| id | name             | title                    | salary |
+----+------------------+--------------------------+--------+
|  1 | James Maddison   | Computer Engineer        |  80000 |
|  2 | Matthew Defoe    | Software Architect       | 150000 |
|  6 | Matthew Defoe    | Software Architect       | 150000 |
|  3 | Daniel Jameson   | Software Engineer II     |  95000 |
|  7 | Daniel Jameson   | Software Engineer II     |  95000 |
|  4 | Jules Reddington | Senior Software Engineer | 120000 |
|  8 | Jules Reddington | Senior Software Engineer | 120000 |
|  5 | Carlos Rodriguez | Data Engineer            | 100000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.001 sec)
Victor A. Oguntuase avatar Victor A. Oguntuase avatar

Victor is an experienced Python Developer, Machine Learning Engineer and Technical Writer with interests across various fields of science and engineering. He is passionate about learning new technologies and skill and working on challenging problems. He enjoys teaching, intellectual discourse, and gaming, among other things.

LinkedIn GitHub