6 Ways to Delete Duplicate Rows that have a Primary Key in SQLite

Below are six examples that delete duplicate rows from a table in SQLite when those rows have a primary key or unique identifier column.

In these instances, the primary key must be ignored when comparing duplicates (due to the fact that primary keys prevent duplicate rows by definition).

Sample Data

Our examples use the following data:

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
2      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson 
6      Wag        Johnson 
7      Wag        Johnson 

We can see that the first two rows contain duplicates, as do the last three rows.

The DogId column holds unique values (because it’s the table’s primary key), and therefore, strictly speaking, there are no duplicates. But in real life situations, you will often want to de-dupe tables that contain primary keys. Therefore, in the following examples we ignore the primary key and delete rows that contain duplicate values across the remaining columns.

Option 1

Here’s our first option for de-duping the above table:

DELETE FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson 

The table has been de-duped as expected.

We can alternatively use the MAX() function instead of the MIN() function to change which rows are deleted. I’ll do this in the next example.

Option 2

In this example (and the following examples) we’ll assume that the table has been restored to its original state (with the duplicates).

Here’s another query that deletes duplicate rows and selects the remaining rows:

DELETE FROM Dogs WHERE DogId IN (
    SELECT d2.DogId 
    FROM Dogs d1, Dogs d2 
    WHERE d1.FirstName = d2.FirstName 
    AND d1.LastName = d2.LastName 
    AND d1.DogId <> d2.DogId 
    AND d1.DogId=( 
        SELECT MAX(DogId) 
        FROM Dogs d3 
        WHERE d3.FirstName = d1.FirstName 
        AND d3.LastName = d1.LastName
    )
);

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
2      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
7      Wag        Johnson 

The table has now been de-duped.

Notice that I used the MAX() function instead of MIN() which I used in the previous example. We can see the affect this has on the de-duping operation. It deleted different rows from the table.

Option 3

Here’s an option that doesn’t require the use of MIN() or MAX():

DELETE FROM Dogs
WHERE EXISTS (
  SELECT 1 FROM Dogs d2 
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
  AND Dogs.DogId > d2.DogId
);

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson 

Option 4

Here’s another option:

DELETE FROM Dogs
WHERE DogId > (
  SELECT MIN(DogId) FROM Dogs d2  
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
);

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson 

Option 5

By default, every row in SQLite has a special column, usually called the rowid, that uniquely identifies that row within the table. Unless it has been explicitly removed from the table, you can use this as a unique identifier for each row. This method could be useful if you’re unable to reference the primary key for some reason (or if the table doesn’t have a primary key).

We can therefore use the rowid in our query instead of the DogId column:

DELETE FROM Dogs
WHERE EXISTS (
  SELECT 1 FROM Dogs d2 
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
  AND Dogs.rowid > d2.rowid
);

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson 

Option 6

And here’s the other example, but with rowid instead of the primary key:

DELETE FROM Dogs
WHERE rowid > (
  SELECT MIN(rowid) FROM Dogs d2  
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
);

SELECT * FROM Dogs;

Result:

DogId  FirstName  LastName
-----  ---------  --------
1      Bark       Smith   
3      Woof       Jones   
4      Ruff       Robinson
5      Wag        Johnson