The following examples use SQL to delete duplicate rows that have a primary key or unique identifier column.
Specifically, the examples delete all duplicates except one. So for example, in the case of three identical rows, it deletes two of them and keeps one.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Dogs;
Result:
+---------+-------------+------------+ | DogId | FirstName | LastName | |---------+-------------+------------| | 1 | Bark | Smith | | 2 | Bark | Smith | | 3 | Woof | Jones | | 4 | Ruff | Robinson | | 5 | Wag | Johnson | | 6 | Wag | Johnson | | 7 | Wag | Johnson | +---------+-------------+------------+
The first two rows are duplicates and the last three rows are duplicates (not counting the first column – the primary key).
MariaDB, PostgreSQL, SQLite, & SQL Server
Here’s an example of deleting duplicates from the above table.
This example works in SQL Server, MariaDB, PostgreSQL, and SQLite.
DELETE FROM Dogs
WHERE DogId IN (
SELECT DogId FROM Dogs
EXCEPT SELECT MIN(DogId) FROM Dogs
GROUP BY FirstName, LastName
);
SELECT * FROM Dogs;
Result:
+---------+-------------+------------+ | DogId | FirstName | LastName | |---------+-------------+------------| | 1 | Bark | Smith | | 3 | Woof | Jones | | 4 | Ruff | Robinson | | 5 | Wag | Johnson | +---------+-------------+------------+
Oracle
The previous example almost works in Oracle. But in Oracle, we need to replace EXCEPT
with MINUS
:
DELETE FROM Dogs
WHERE DogId IN (
SELECT DogId FROM Dogs
MINUS SELECT MIN(DogId) FROM Dogs
GROUP BY FirstName, LastName
);
SELECT * FROM Dogs;
Result:
+---------+-------------+------------+ | DogId | FirstName | LastName | |---------+-------------+------------| | 1 | Bark | Smith | | 3 | Woof | Jones | | 4 | Ruff | Robinson | | 5 | Wag | Johnson | +---------+-------------+------------+
MySQL
MySQL doesn’t support the EXCEPT
operator or the MINUS
operator. Therefore we need to construct a different query altogether.
The following query gets the job done:
DELETE d1 FROM Dogs d1
INNER JOIN Dogs d2
WHERE
d1.DogId > d2.DogId AND
d1.FirstName = d2.FirstName AND
d1.LastName = d2.LastName;
SELECT * FROM Dogs;
Result:
+---------+-------------+------------+ | DogId | FirstName | LastName | |---------+-------------+------------| | 1 | Bark | Smith | | 3 | Woof | Jones | | 4 | Ruff | Robinson | | 5 | Wag | Johnson | +---------+-------------+------------+
Using a CTE to Delete Duplicates
See Using a CTE with a DELETE Statement in SQL Server to Remove Duplicates for a step-by-step guide to deleting duplicate rows using a CTE. This includes viewing the data to be deleted before deleting it.
Logging Deleted Rows
Some RDBMSs provide us with a handy clause that allows us to see or log all deleted rows. For example:
- PostgreSQL has the
RETURNING
clause that returns a list of the rows that were deleted. - SQL Server has the
OUTPUT
clause that does the same thing, but also allows us to insert the deleted rows into another table or table variable.