Here’s an example of using SQL to find duplicate rows in a database table. This technique can be used in most of the major RDBMSs, including SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, and SQLite.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog 4 Bark Dog 4 Bark Dog
The first two rows are duplicates, as are the last three rows. That’s because all three columns contain the same values in each duplicate row.
In this case, even the PetId
column contains duplicates, so the table obviously doesn’t contain a primary key.
Select Duplicates
We can use the following SQL example to select all duplicate rows from this table.
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY PetId;
Result:
+---------+-----------+-----------+---------+ | PetId | PetName | PetType | Count | |---------+-----------+-----------+---------| | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+---------+
We can have the duplicates appear first by ordering it by count in descending order:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY COUNT(*) DESC;
Result:
+---------+-----------+-----------+---------+ | PetId | PetName | PetType | Count | |---------+-----------+-----------+---------| | 4 | Bark | Dog | 3 | | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | +---------+-----------+-----------+---------+
List the Duplicates Only
If we only want to list the duplicate rows, we can use the SQL’s HAVING
clause to exclude non-duplicates from the output:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
HAVING COUNT(*) > 1
ORDER BY PetId;
Result:
+-------+---------+---------+-------+ | PetId | PetName | PetType | Count | +-------+---------+---------+-------+ | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 3 | +-------+---------+---------+-------+
This excludes non-duplicates from the result. So if you have no duplicates, then your result should be empty.
List All Duplicate Rows without Grouping Them
If you want all duplicate rows to be listed out separately (without being grouped), the ROW_NUMBER()
window function should be able to help:
SELECT
PetId,
PetName,
PetType,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS rn
FROM Pets;
Result:
+---------+-----------+-----------+------+ | PetId | PetName | PetType | rn | |---------+-----------+-----------+------| | 1 | Wag | Dog | 1 | | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 1 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+------+
Using the PARTITION BY
clause allows us to partition the results based on duplicates. Whenever duplicates are found, they are numbered as such. For example, if three rows contain duplicate values, then they are numbered 1, 2, and 3 (as we can see with the dog called Bark).
And to list out just the excess duplicates, we can use the above query as a common table expression:
WITH cte AS
(
SELECT
PetId,
PetName,
PetType,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS rn
FROM Pets
)
SELECT * FROM cte WHERE rn <> 1;
Result:
+---------+-----------+-----------+------+ | PetId | PetName | PetType | rn | |---------+-----------+-----------+------| | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+------+
This query can be used to return a list of rows that would be removed if we ran a de-duping operation against the table.
In some DBMSs (such as SQL Server), we can replace SELECT *
on the last row with DELETE
in order to delete these rows.