Select Duplicate Rows in SQL

Here’s an example of using SQL to find duplicate rows in a database table. This technique can be used in most of the major RDBMSs, including SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, and SQLite.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Pets;

Result:

PetId  PetName  PetType
-----  -------  -------
1      Wag      Dog    
1      Wag      Dog    
2      Scratch  Cat    
3      Tweet    Bird   
4      Bark     Dog    
4      Bark     Dog    
4      Bark     Dog

The first two rows are duplicates, as are the last three rows. That’s because all three columns contain the same values in each duplicate row.

In this case, even the PetId column contains duplicates, so the table obviously doesn’t contain a primary key.

Select Duplicates

We can use the following SQL example to select all duplicate rows from this table.

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY PetId;

Result:

+---------+-----------+-----------+---------+
| PetId   | PetName   | PetType   | Count   |
|---------+-----------+-----------+---------|
| 1       | Wag       | Dog       | 2       |
| 2       | Scratch   | Cat       | 1       |
| 3       | Tweet     | Bird      | 1       |
| 4       | Bark      | Dog       | 3       |
+---------+-----------+-----------+---------+

We can have the duplicates appear first by ordering it by count in descending order:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY COUNT(*) DESC;

Result:

+---------+-----------+-----------+---------+
| PetId   | PetName   | PetType   | Count   |
|---------+-----------+-----------+---------|
| 4       | Bark      | Dog       | 3       |
| 1       | Wag       | Dog       | 2       |
| 2       | Scratch   | Cat       | 1       |
| 3       | Tweet     | Bird      | 1       |
+---------+-----------+-----------+---------+

List the Duplicates Only

If we only want to list the duplicate rows, we can use the SQL’s HAVING clause to exclude non-duplicates from the output:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
HAVING COUNT(*) > 1
ORDER BY PetId;

Result:

+-------+---------+---------+-------+
| PetId | PetName | PetType | Count |
+-------+---------+---------+-------+
|     1 | Wag     | Dog     |     2 |
|     4 | Bark    | Dog     |     3 |
+-------+---------+---------+-------+

This excludes non-duplicates from the result. So if you have no duplicates, then your result should be empty.

List All Duplicate Rows without Grouping Them

If you want all duplicate rows to be listed out separately (without being grouped), the ROW_NUMBER() window function should be able to help:

SELECT 
    PetId,
    PetName,
    PetType,
    ROW_NUMBER() OVER ( 
        PARTITION BY PetId, PetName, PetType 
        ORDER BY PetId, PetName, PetType
        ) AS rn
FROM Pets;

Result:

+---------+-----------+-----------+------+
| PetId   | PetName   | PetType   | rn   |
|---------+-----------+-----------+------|
| 1       | Wag       | Dog       | 1    |
| 1       | Wag       | Dog       | 2    |
| 2       | Scratch   | Cat       | 1    |
| 3       | Tweet     | Bird      | 1    |
| 4       | Bark      | Dog       | 1    |
| 4       | Bark      | Dog       | 2    |
| 4       | Bark      | Dog       | 3    |
+---------+-----------+-----------+------+

Using the PARTITION BY clause allows us to partition the results based on duplicates. Whenever duplicates are found, they are numbered as such. For example, if three rows contain duplicate values, then they are numbered 1, 2, and 3 (as we can see with the dog called Bark).

And to list out just the excess duplicates, we can use the above query as a common table expression:

WITH cte AS 
    (
        SELECT 
            PetId,
            PetName,
            PetType,
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS rn
        FROM Pets
    )
SELECT * FROM cte WHERE rn <> 1;

Result:

+---------+-----------+-----------+------+
| PetId   | PetName   | PetType   | rn   |
|---------+-----------+-----------+------|
| 1       | Wag       | Dog       | 2    |
| 4       | Bark      | Dog       | 2    |
| 4       | Bark      | Dog       | 3    |
+---------+-----------+-----------+------+

This query can be used to return a list of rows that would be removed if we ran a de-duping operation against the table.

In some DBMSs (such as SQL Server), we can replace SELECT * on the last row with DELETE in order to delete these rows.