7 Ways to Find Duplicate Rows while Ignoring the Primary Key in MySQL

Here are seven ways to return duplicate rows in MySQL when those rows have a primary key or other unique identifier column.

Sample Data

We’ll use the following data for our examples:

DROP TABLE IF EXISTS Dogs;
CREATE TABLE Dogs (
    DogId int PRIMARY KEY NOT NULL,
    FirstName varchar(50),
    LastName varchar(50)
    );

INSERT INTO Dogs VALUES
    (1, 'Bark', 'Smith'),
    (2, 'Bark', 'Smith'),
    (3, 'Woof', 'Jones'),
    (4, 'Ruff', 'Robinson'),
    (5, 'Wag', 'Johnson'),
    (6, 'Wag', 'Johnson'),
    (7, 'Wag', 'Johnson');
SELECT * FROM Dogs;

Result:

+-------+-----------+----------+
| DogId | FirstName | LastName |
+-------+-----------+----------+
|     1 | Bark      | Smith    |
|     2 | Bark      | Smith    |
|     3 | Woof      | Jones    |
|     4 | Ruff      | Robinson |
|     5 | Wag       | Johnson  |
|     6 | Wag       | Johnson  |
|     7 | Wag       | Johnson  |
+-------+-----------+----------+

The duplicate rows share exactly the same values across all columns except for their primary key/unique ID column.

The first two rows are duplicates (except for the DogId column, which is the table’s primary key, and contains a unique value across all rows). The last three rows are also duplicates (except for the DogId column).

The primary key column ensures that there are no duplicate rows, which is normally a good thing in RDBMSs. However, by definition this means that there are no duplicates. In our case, the primary key column is an incrementing number, and its value carries no meaning and is not significant. We therefore need to ignore that row if we want to find duplicates in the columns that are significant.

Option 1

Our first option is to use the GROUP BY clause to group the columns by their significant columns, then use the COUNT() function to return the number of identical rows:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName;

Result:

+-----------+----------+-------+
| FirstName | LastName | Count |
+-----------+----------+-------+
| Bark      | Smith    |     2 |
| Woof      | Jones    |     1 |
| Ruff      | Robinson |     1 |
| Wag       | Johnson  |     3 |
+-----------+----------+-------+

We were able to ignore the primary key column by omitting it from our query.

The result tells us that there are two rows containing Bark Smith and three rows containing Wag Johnson. These are duplicates (or triplicates in the case of Wag Johnson). The other two rows do not have any duplicates.

Option 2

We can exclude non-duplicates from the output with the HAVING clause:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1;

Result:

+-----------+----------+-------+
| FirstName | LastName | Count |
+-----------+----------+-------+
| Bark      | Smith    |     2 |
| Wag       | Johnson  |     3 |
+-----------+----------+-------+

Option 3

We can also check for duplicates on concatenated columns. For example, we can use the CONCAT() function to concatenate our two columns, use the DISTINCT keyword to get distinct values, then use the COUNT() function to return the count:

SELECT
    DISTINCT CONCAT(FirstName, ' ', LastName) AS DogName,
    COUNT(*) AS Count
FROM Dogs
GROUP BY CONCAT(FirstName, ' ', LastName);

Result:

+---------------+-------+
| DogName       | Count |
+---------------+-------+
| Bark Smith    |     2 |
| Woof Jones    |     1 |
| Ruff Robinson |     1 |
| Wag Johnson   |     3 |
+---------------+-------+

Option 4

We can alternatively use the ROW_NUMBER() function with the PARTITION BY clause:

SELECT 
    *,
    ROW_NUMBER() OVER ( 
        PARTITION BY FirstName, LastName 
        ORDER BY FirstName, LastName
        ) AS rn
FROM Dogs;

Result:

+-------+-----------+----------+----+
| DogId | FirstName | LastName | rn |
+-------+-----------+----------+----+
|     1 | Bark      | Smith    |  1 |
|     2 | Bark      | Smith    |  2 |
|     4 | Ruff      | Robinson |  1 |
|     5 | Wag       | Johnson  |  1 |
|     6 | Wag       | Johnson  |  2 |
|     7 | Wag       | Johnson  |  3 |
|     3 | Woof      | Jones    |  1 |
+-------+-----------+----------+----+

This creates a new column with a row number that increments each time there’s a duplicate, but resets again when there’s a unique row

This technique provides a possible benefit in that we don’t need to group the results. This means we can see each duplicate row, including its unique identifier column.

Option 5

We can use the previous example as a common table expression in a larger query:

WITH cte AS 
    (
        SELECT 
            *,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS rn
        FROM Dogs
    )
SELECT * FROM cte WHERE rn <> 1;

Result:

+-------+-----------+----------+----+
| DogId | FirstName | LastName | rn |
+-------+-----------+----------+----+
|     2 | Bark      | Smith    |  2 |
|     6 | Wag       | Johnson  |  2 |
|     7 | Wag       | Johnson  |  3 |
+-------+-----------+----------+----+

This technique excludes non-duplicates from the output, and it excludes one row of each duplicate from the output.

This query can be used as a precursor to a de-duping operation. It can show us what’s going to be deleted if we decide to delete duplicates. To de-dupe the table, all we need to do is replace the last SELECT * with DELETE.

Option 6

Here’s a more concise way to get the same output as the previous example:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    WHERE DogId NOT IN (SELECT MIN(DogId) FROM Dogs
    GROUP BY FirstName, LastName)
    );

Result:

+-------+-----------+----------+
| DogId | FirstName | LastName |
+-------+-----------+----------+
|     2 | Bark      | Smith    |
|     6 | Wag       | Johnson  |
|     7 | Wag       | Johnson  |
+-------+-----------+----------+

This technique doesn’t require us to generate a separate row number with ROW_NUMBER() like in the previous example.

We can also replace SELECT * with DELETE to delete the duplicates.

Option 7

And finally, here’s one more option for returning duplicates:

SELECT * 
FROM Dogs d1, Dogs d2 
WHERE d1.FirstName = d2.FirstName 
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId 
AND d1.DogId = (
    SELECT MAX(DogId) 
    FROM Dogs d3 
    WHERE d3.FirstName = d1.FirstName 
    AND d3.LastName = d1.LastName
);

Result:

+-------+-----------+----------+-------+-----------+----------+
| DogId | FirstName | LastName | DogId | FirstName | LastName |
+-------+-----------+----------+-------+-----------+----------+
|     2 | Bark      | Smith    |     1 | Bark      | Smith    |
|     7 | Wag       | Johnson  |     5 | Wag       | Johnson  |
|     7 | Wag       | Johnson  |     6 | Wag       | Johnson  |
+-------+-----------+----------+-------+-----------+----------+