The following options can be used to delete duplicate rows in SQLite.
These examples delete duplicate rows but keep one. So if there are three identical rows for example, it deletes two of them and keeps one. This is often referred to as de-duping the table.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog 4 Bark Dog 4 Bark Dog
We can see that the first two rows are duplicates, as are the last three rows.
In this case, all columns are duplicates. There is no primary key. The PetId
column might look like it could be a primary key, but it actually contains duplicate values. Therefore, it’s not a unique identifier for each row, and it cannot be used as a primary key.
If there was a primary key, it would contain unique values across all rows, and there would be no duplicates.
In any case, below are two options for finding and deleting duplicate rows.
Option 1
Before we de-dupe the table, we can use the following query to see which row/s will be deleted:
SELECT * FROM Pets
WHERE EXISTS (
SELECT 1 FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
AND Pets.rowid > p2.rowid
);
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 4 Bark Dog 4 Bark Dog
This shows us that three rows are going to be deleted when we de-dupe the table in the next step.
To delete the duplicate values, we can modify the above query by replacing SELECT *
with DELETE
:
DELETE FROM Pets
WHERE EXISTS (
SELECT 1 FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
AND Pets.rowid > p2.rowid
);
SELECT * FROM Pets;
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog
The table has now been de-duped.
As expected, one of the duplicate rows for our dog “Wag” has been deleted and the other one remains. Two of the duplicate rows for “Bark” have also been deleted.
The reason we were able to do this is because of SQLite’s rowid
. By default, every row in SQLite has a special column, usually called the rowid
, that uniquely identifies that row within the table. Unless it has been explicitly removed from the table, you can use this as a unique identifier for each row, which enables us to construct the above queries. The same applies to the next example.
Option 2
Assuming the table has been restored with its original data (including duplicate rows), here’s another option for de-duping it.
Check which rows will be deleted:
SELECT * FROM Pets
WHERE rowid > (
SELECT MIN(rowid) FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
);
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 4 Bark Dog 4 Bark Dog
Now delete those rows:
DELETE FROM Pets
WHERE rowid > (
SELECT MIN(rowid) FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
);
SELECT * FROM Pets;
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog
The table has been de-duped.