2 Ways to Delete Duplicate Rows in PostgreSQL (Ignoring the Primary Key)

Below are two options for removing duplicate rows from a table in PostgreSQL when those rows have a primary key or unique identifier column. The primary key is used in the query, but it’s ignored when comparing duplicates (due to the fact that primary keys prevent duplicate rows by definition).

The following examples delete duplicate rows but keep one. So in the case of say, three identical rows, it deletes two and keeps one.

Sample Data

Our examples use the following data:

SELECT * FROM Dogs;

Result:

 dogid | firstname | lastname 
-------+-----------+----------
     1 | Bark      | Smith
     2 | Bark      | Smith
     3 | Woof      | Jones
     4 | Ruff      | Robinson
     5 | Wag       | Johnson
     6 | Wag       | Johnson
     7 | Wag       | Johnson

We can see that the first two rows are duplicates, as well as the last three rows.

The DogId column holds unique values (because it’s the table’s primary key), and therefore, strictly speaking, there are no duplicates. But in real life situations, you will often want to de-dupe tables that contain primary keys. Therefore, in this article we ignore the primary key and delete rows that contain duplicate values across the remaining columns.

Option 1

We can start by finding all rows that should be deleted:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    )
ORDER BY FirstName, LastName;

Result:

 dogid | firstname | lastname 
-------+-----------+----------
     2 | Bark      | Smith
     6 | Wag       | Johnson
     7 | Wag       | Johnson

These are the excess rows from the list of duplicates. What I mean is that, these are the actual rows that will be deleted. For each set of duplicates listed here, there’s one that’s not listed (and will not be deleted). For example, the table contains three dogs called Wag Johnson, but there are only two Wag Johnson’s listed here – the two that will be deleted. The other Wag Johnson will remain in the database.

To remove those duplicate rows, we can switch the SELECT * to DELETE and remove the ORDER BY clause:

DELETE FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

Result:

DELETE 3

To verify the result, we can select all remaining rows in the table:

SELECT * FROM Dogs;

Result:

 dogid | firstname | lastname 
-------+-----------+----------
     1 | Bark      | Smith
     3 | Woof      | Jones
     4 | Ruff      | Robinson
     5 | Wag       | Johnson

We can alternatively use the MAX() function instead of the MIN() function to change which rows are deleted.

Option 2

In this example we’ll assume that the table has been restored to its original state (with the duplicates).

Here’s another query that returns duplicate rows:

SELECT * 
FROM Dogs d1, Dogs d2 
WHERE d1.FirstName = d2.FirstName 
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId 
AND d1.DogId = (
    SELECT MAX(DogId) 
    FROM Dogs d3 
    WHERE d3.FirstName = d1.FirstName 
    AND d3.LastName = d1.LastName
);

Result:

 dogid | firstname | lastname | dogid | firstname | lastname 
-------+-----------+----------+-------+-----------+----------
     2 | Bark      | Smith    |     1 | Bark      | Smith
     7 | Wag       | Johnson  |     5 | Wag       | Johnson
     7 | Wag       | Johnson  |     6 | Wag       | Johnson

We can modify that query to delete the duplicates:

DELETE FROM Dogs WHERE DogId IN (
    SELECT d2.DogId 
    FROM Dogs d1, Dogs d2 
    WHERE d1.FirstName = d2.FirstName 
    AND d1.LastName = d2.LastName 
    AND d1.DogId <> d2.DogId 
    AND d1.DogId=( 
        SELECT MIN(DogId) 
        FROM Dogs d3 
        WHERE d3.FirstName = d1.FirstName 
        AND d3.LastName = d1.LastName
    )
);

Result:

DELETE 3

The table has now been de-duped.

We can verify this by selecting all rows again:

SELECT * FROM Dogs;

Result:

 dogid | firstname | lastname 
-------+-----------+----------
     1 | Bark      | Smith
     3 | Woof      | Jones
     4 | Ruff      | Robinson
     5 | Wag       | Johnson

We can use MAX() instead of MIN() to delete the other rows from the duplicates if preferred.