Below are six examples that delete duplicate rows from a table in SQLite when those rows have a primary key or unique identifier column.
In these instances, the primary key must be ignored when comparing duplicates (due to the fact that primary keys prevent duplicate rows by definition).
Sample Data
Our examples use the following data:
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 2 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson 6 Wag Johnson 7 Wag Johnson
We can see that the first two rows contain duplicates, as do the last three rows.
The DogId
column holds unique values (because it’s the table’s primary key), and therefore, strictly speaking, there are no duplicates. But in real life situations, you will often want to de-dupe tables that contain primary keys. Therefore, in the following examples we ignore the primary key and delete rows that contain duplicate values across the remaining columns.
Option 1
Here’s our first option for de-duping the above table:
DELETE FROM Dogs
WHERE DogId IN (
SELECT DogId FROM Dogs
EXCEPT SELECT MIN(DogId) FROM Dogs
GROUP BY FirstName, LastName
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson
The table has been de-duped as expected.
We can alternatively use the MAX()
function instead of the MIN()
function to change which rows are deleted. I’ll do this in the next example.
Option 2
In this example (and the following examples) we’ll assume that the table has been restored to its original state (with the duplicates).
Here’s another query that deletes duplicate rows and selects the remaining rows:
DELETE FROM Dogs WHERE DogId IN (
SELECT d2.DogId
FROM Dogs d1, Dogs d2
WHERE d1.FirstName = d2.FirstName
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId
AND d1.DogId=(
SELECT MAX(DogId)
FROM Dogs d3
WHERE d3.FirstName = d1.FirstName
AND d3.LastName = d1.LastName
)
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 3 Woof Jones 4 Ruff Robinson 7 Wag Johnson
The table has now been de-duped.
Notice that I used the MAX()
function instead of MIN()
which I used in the previous example. We can see the affect this has on the de-duping operation. It deleted different rows from the table.
Option 3
Here’s an option that doesn’t require the use of MIN()
or MAX()
:
DELETE FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.DogId > d2.DogId
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson
Option 4
Here’s another option:
DELETE FROM Dogs
WHERE DogId > (
SELECT MIN(DogId) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson
Option 5
By default, every row in SQLite has a special column, usually called the rowid
, that uniquely identifies that row within the table. Unless it has been explicitly removed from the table, you can use this as a unique identifier for each row. This method could be useful if you’re unable to reference the primary key for some reason (or if the table doesn’t have a primary key).
We can therefore use the rowid
in our query instead of the DogId
column:
DELETE FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.rowid > d2.rowid
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson
Option 6
And here’s the other example, but with rowid
instead of the primary key:
DELETE FROM Dogs
WHERE rowid > (
SELECT MIN(rowid) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson