Find Duplicate Rows that have a Primary Key in SQL

Here are examples of using SQL to return duplicate rows when those rows have a primary key or other unique identifier column.

These queries work in most of the major RDBMSs, including SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, and SQLite.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Dogs;

Result:

+-------+-----------+----------+
| DogId | FirstName | LastName |
+-------+-----------+----------+
|     1 | Bark      | Smith    |
|     2 | Bark      | Smith    |
|     3 | Woof      | Jones    |
|     4 | Ruff      | Robinson |
|     5 | Wag       | Johnson  |
|     6 | Wag       | Johnson  |
|     7 | Wag       | Johnson  |
+-------+-----------+----------+

The duplicate rows share exactly the same values across all columns except for their primary key/unique ID column.

The primary key ensures that there are no duplicate rows, which is normally a good thing in RDBMSs. However, by definition this means that there are no duplicate rows. In our case, the primary key column is an incrementing number, and its value carries no meaning and is not significant. We therefore need to ignore that row if we want to find duplicates in the columns that are significant.

Count the Duplicates

Here’s a query that counts the duplicate rows. More specifically, it groups the rows by their relevant columns, then counts how many are in each group. Groups containing more than one are duplicates.

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName;

Result:

+-------------+------------+---------+
| FirstName   | LastName   | Count   |
|-------------+------------+---------|
| Wag         | Johnson    | 3       |
| Woof        | Jones      | 1       |
| Ruff        | Robinson   | 1       |
| Bark        | Smith      | 2       |
+-------------+------------+---------+

We used SQL’s GROUP BY clause in our SELECT statement, and then the COUNT() function to count how many rows are in each group.

We can see that Bark Smith occurs twice, and Wag Johnson three times.

In this case, we were able to ignore the primary key column by omitting it from our query.

Move Duplicates to the Top

We can use the ORDER BY clause to order the results by the count in descending order. This makes the duplicate rows appear first.

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
ORDER BY COUNT(*) DESC;

Result:

+-------------+------------+---------+
| FirstName   | LastName   | Count   |
|-------------+------------+---------|
| Wag         | Johnson    | 3       |
| Bark        | Smith      | 2       |
| Woof        | Jones      | 1       |
| Ruff        | Robinson   | 1       |
+-------------+------------+---------+

List Only the Duplicates

We can use SQL’s HAVING clause to exclude non-duplicates.

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;

Result:

+-------------+------------+---------+
| FirstName   | LastName   | Count   |
|-------------+------------+---------|
| Wag         | Johnson    | 3       |
| Bark        | Smith      | 2       |
+-------------+------------+---------+

Provide a Running Count of Duplicate Rows

If you need to list out all duplicate rows, the above examples won’t cut it.

But not a problem. We can use the ROW_NUMBER() function instead to do just that:

SELECT 
    DogId,
    FirstName,
    LastName,
    ROW_NUMBER() OVER ( 
        PARTITION BY FirstName, LastName 
        ORDER BY FirstName, LastName
        ) AS rn
FROM Dogs;

Result:

+---------+-------------+------------+------+
| DogId   | FirstName   | LastName   | rn   |
|---------+-------------+------------+------|
| 1       | Bark        | Smith      | 1    |
| 2       | Bark        | Smith      | 2    |
| 4       | Ruff        | Robinson   | 1    |
| 5       | Wag         | Johnson    | 1    |
| 6       | Wag         | Johnson    | 2    |
| 7       | Wag         | Johnson    | 3    |
| 3       | Woof        | Jones      | 1    |
+---------+-------------+------------+------+

Using the PARTITION clause allows us to partition the results based on the columns we’re trying to find duplicates in. Whenever duplicates are found, they are numbered as such. For example, if two rows are duplicates, they’re numbered 1 and 2. If three rows are duplicates, they’re numbered 1, 2, and 3 (as we can see with the dog called Bark).

In the above example, we can see the primary key column on the left, and the running count of the duplicates on the right.

And to list out just the excess duplicates, we can use the above query as a common table expression:

WITH cte AS 
    (
        SELECT 
            DogId,
            FirstName,
            LastName,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS rn
        FROM Dogs
    )
SELECT * FROM cte WHERE rn <> 1;

Result:

+---------+-------------+------------+------+
| DogId   | FirstName   | LastName   | rn   |
|---------+-------------+------------+------|
| 2       | Bark        | Smith      | 2    |
| 6       | Wag         | Johnson    | 2    |
| 7       | Wag         | Johnson    | 3    |
+---------+-------------+------------+------+

This query can be used to return a list of rows that would be removed if we ran a de-duping operation against the table.

In some DBMSs (such as SQL Server), we can replace SELECT * on the last row with DELETE in order to delete these rows.

3 Queries that Include the Primary Key

The previous examples could work just as easily on tables that don’t have a primary key, as they specifically select the columns that we want to compare for duplicates.

Below are three queries that reference the primary key column (while excluding it from their duplicate evaluation).

Query 1:

SELECT * FROM Dogs
WHERE EXISTS (
  SELECT 1 FROM Dogs d2 
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
  AND Dogs.DogId > d2.DogId
);

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 2       | Bark        | Smith      |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

Query 2:

SELECT * FROM Dogs
WHERE DogId > (
  SELECT MIN(DogId) FROM Dogs d2  
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
);

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 2       | Bark        | Smith      |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

Query 3:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    WHERE DogId NOT IN (SELECT MIN(DogId) FROM Dogs
    GROUP BY FirstName, LastName)
    );

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 2       | Bark        | Smith      |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

Similar to the previous example, these could be converted to a de-duping query by changing the SELECT * to DELETE.