3 Ways to Delete Duplicate Rows in SQL Server while Ignoring the Primary Key

The following examples use T-SQL to delete duplicate rows in SQL Server while ignoring the primary key or unique identifier column.

More specifically, the examples delete duplicate rows but keep one. So, given two identical rows, one is deleted and the other remains. This is often referred to as “de-duping” the table, “deduplication” of the table, etc.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Dogs;

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 1       | Bark        | Smith      |
| 2       | Bark        | Smith      |
| 3       | Woof        | Jones      |
| 4       | Ruff        | Robinson   |
| 5       | Wag         | Johnson    |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

We can see that the first two rows are duplicates, and so are the last three rows.

Option 1

First, let’s run the following code to check which rows are going to be de-duped:

WITH cte AS 
    (
        SELECT 
            *,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS Row_Number
        FROM Dogs
    )
SELECT * FROM cte WHERE Row_Number <> 1;

Result:

+---------+-------------+------------+--------------+
| DogId   | FirstName   | LastName   | Row_Number   |
|---------+-------------+------------+--------------|
| 2       | Bark        | Smith      | 2            |
| 6       | Wag         | Johnson    | 2            |
| 7       | Wag         | Johnson    | 3            |
+---------+-------------+------------+--------------+

We used the ROW_NUMBER() function with the PARTITION BY clause to create our own row number that increments when any duplicates are found, and resets when a non-duplicate is found. A number greater than 1 indicates that it’s a duplicate, and so we only return rows that have a number greater than 1.

We can see that three rows will be deleted when we de-dupe this table.

Now let’s de-dupe the table:

WITH cte AS 
    (
        SELECT 
            *,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS Row_Number
        FROM Dogs
    )
DELETE FROM cte WHERE Row_Number <> 1;

Result:

(3 rows affected)

As expected, three rows were deleted.

This query is almost identical to the previous one. All we did was change SELECT * on the last line to DELETE.

Now let’s select all rows from the table in order to verify that the correct rows were deleted:

SELECT * FROM Dogs;

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 1       | Bark        | Smith      |
| 3       | Woof        | Jones      |
| 4       | Ruff        | Robinson   |
| 5       | Wag         | Johnson    |
+---------+-------------+------------+

We can see that each dog now appears just once in the table.

Option 2

Assuming that the table has been restored after the previous example, here’s another way of checking for duplicates:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 2       | Bark        | Smith      |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

In this case, we made use of the EXCEPT operator along with the MIN() function. We could replace MIN() with MAX() depending on which rows we want to be deleted.

To delete the rows, we can simply replace SELECT * with DELETE:

DELETE FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

Result:

(3 rows affected)

And check to see what’s left:

SELECT * FROM Dogs;

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 1       | Bark        | Smith      |
| 3       | Woof        | Jones      |
| 4       | Ruff        | Robinson   |
| 5       | Wag         | Johnson    |
+---------+-------------+------------+

Option 3

Another way to do it is to join the table on itself and check for duplicates that way.

Assuming that the table has been restored after the previous example, here’s our third option for selecting duplicates:

SELECT * 
FROM Dogs d1, Dogs d2 
WHERE d1.FirstName = d2.FirstName 
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId 
AND d1.DogId = (
    SELECT MAX(DogId) 
    FROM Dogs d3 
    WHERE d3.FirstName = d1.FirstName 
    AND d3.LastName = d1.LastName
);

Result:

+---------+-------------+------------+---------+-------------+------------+
| DogId   | FirstName   | LastName   | DogId   | FirstName   | LastName   |
|---------+-------------+------------+---------+-------------+------------|
| 2       | Bark        | Smith      | 1       | Bark        | Smith      |
| 7       | Wag         | Johnson    | 5       | Wag         | Johnson    |
| 7       | Wag         | Johnson    | 6       | Wag         | Johnson    |
+---------+-------------+------------+---------+-------------+------------+

This result is not quite as clear as the one in the previous example, but we can still see which rows are duplicates.

Now we can modify that query so that we delete duplicate rows:

DELETE FROM Dogs WHERE DogId IN (
    SELECT d2.DogId 
    FROM Dogs d1, Dogs d2 
    WHERE d1.FirstName = d2.FirstName 
    AND d1.LastName = d2.LastName 
    AND d1.DogId <> d2.DogId 
    AND d1.DogId=( 
        SELECT MAX(DogId) 
        FROM Dogs d3 
        WHERE d3.FirstName = d1.FirstName 
        AND d3.LastName = d1.LastName
    )
);

Result:

(3 rows affected)

Once again, three rows were deleted.

Let’s check the table again:

SELECT * FROM Dogs;

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 2       | Bark        | Smith      |
| 3       | Woof        | Jones      |
| 4       | Ruff        | Robinson   |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

You might notice that this time the other rows were deleted. In other words, we now have DogIds 2, 3, 4, and 7 whereas in the previous examples we were left with 1, 3, 4, and 5.

We can easily change this example to delete the same rows as the previous examples. To do this, we can use the MIN() function instead of the MAX() function:

DELETE FROM Dogs WHERE DogId IN (
    SELECT d2.DogId 
    FROM Dogs d1, Dogs d2 
    WHERE d1.FirstName = d2.FirstName 
    AND d1.LastName = d2.LastName 
    AND d1.DogId <> d2.DogId 
    AND d1.DogId=( 
        SELECT MIN(DogId) 
        FROM Dogs d3 
        WHERE d3.FirstName = d1.FirstName 
        AND d3.LastName = d1.LastName
    )
);