7 Ways to Find Duplicate Rows in SQL Server while Ignoring any Primary Key

Here are seven options for finding duplicate rows in SQL Server, when those rows have a primary key or other unique identifier column.

In other words, the table contains two or more rows that share exactly the same values across all columns except for its unique identifier column.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Dogs;

Result:

+---------+-------------+------------+
| DogId   | FirstName   | LastName   |
|---------+-------------+------------|
| 1       | Bark        | Smith      |
| 2       | Bark        | Smith      |
| 3       | Woof        | Jones      |
| 4       | Ruff        | Robinson   |
| 5       | Wag         | Johnson    |
| 6       | Wag         | Johnson    |
| 7       | Wag         | Johnson    |
+---------+-------------+------------+

We can see that the first two rows are duplicates (except for the DogId column, which contains a unique value across all rows, and could be used as the table’s primary key column). We can also see that the last three rows are duplicates (except for the DogId column).

The unique ID column ensures that there are no duplicate rows, which is normally a highly desirable trait in RDBMSs. However, in this case it has the potential to interfere with our ability to find duplicates. By definition, the unique ID column ensures that there are no duplicates. Fortunately, we can overcome this issue quite easily, as the following examples show.

Option 1

Probably the easiest/simplest way to do it is with a simple query that uses the GROUP BY clause:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName;

Result:

+-------------+------------+---------+
| FirstName   | LastName   | Count   |
|-------------+------------+---------|
| Wag         | Johnson    | 3       |
| Woof        | Jones      | 1       |
| Ruff        | Robinson   | 1       |
| Bark        | Smith      | 2       |
+-------------+------------+---------+

We were able to exclude the primary key/unique ID column by omitting it from our query.

The result tells us that there are three rows containing Wag Johnson and two rows containing Bark Smith. These are duplicates (or triplicates in the case of Wag Johnson).

Option 2

We can exclude non-duplicates from the result by including the HAVING clause in our query:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1;

Result:

+-------------+------------+---------+
| FirstName   | LastName   | Count   |
|-------------+------------+---------|
| Wag         | Johnson    | 3       |
| Bark        | Smith      | 2       |
+-------------+------------+---------+

Option 3

We can also check for duplicates on concatenated columns. For example, we can use the CONCAT() function to concatenate our two columns:

SELECT
    DISTINCT CONCAT(FirstName, ' ', LastName) AS DogName,
    COUNT(*) AS Count
FROM Dogs
GROUP BY CONCAT(FirstName, ' ', LastName);

Result:

+---------------+---------+
| DogName       | Count   |
|---------------+---------|
| Bark Smith    | 2       |
| Ruff Robinson | 1       |
| Wag Johnson   | 3       |
| Woof Jones    | 1       |
+---------------+---------+

Option 4

We can use the ROW_NUMBER() function with the PARTITION BY clause to create a new column with a row number that increments each time there’s a duplicate, but resets again when there’s a unique row:

SELECT 
    *,
    ROW_NUMBER() OVER ( 
        PARTITION BY FirstName, LastName 
        ORDER BY FirstName, LastName
        ) AS Row_Number
FROM Dogs;

Result:

+---------+-------------+------------+--------------+
| DogId   | FirstName   | LastName   | Row_Number   |
|---------+-------------+------------+--------------|
| 1       | Bark        | Smith      | 1            |
| 2       | Bark        | Smith      | 2            |
| 4       | Ruff        | Robinson   | 1            |
| 5       | Wag         | Johnson    | 1            |
| 6       | Wag         | Johnson    | 2            |
| 7       | Wag         | Johnson    | 3            |
| 3       | Woof        | Jones      | 1            |
+---------+-------------+------------+--------------+

One benefit of this method is that we can see each and every duplicate row, along with its unique identifier column, due to the fact that we’re not grouping the results.

Option 5

We can also use the previous example as a common table expression in a larger query:

WITH cte AS 
    (
        SELECT 
            *,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS Row_Number
        FROM Dogs
    )
SELECT * FROM cte WHERE Row_Number <> 1;

Result:

+---------+-------------+------------+--------------+
| DogId   | FirstName   | LastName   | Row_Number   |
|---------+-------------+------------+--------------|
| 2       | Bark        | Smith      | 2            |
| 6       | Wag         | Johnson    | 2            |
| 7       | Wag         | Johnson    | 3            |
+---------+-------------+------------+--------------+

This option excludes non-duplicates from the output.

It also excludes exactly one row of each duplicate from the output. This opens the door for us to turn the last SELECT * into a DELETE to de-dupe the table while keeping one of each duplicate.

Option 6

Here’s a more succinct way to get the same output as the previous example:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    EXCEPT SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

Result:

+-------+-----------+----------+
| DogId | FirstName | LastName |
+-------+-----------+----------+
|     2 | Bark      | Smith    |
|     6 | Wag       | Johnson  |
|     7 | Wag       | Johnson  |
+-------+-----------+----------+

This example doesn’t require generating our own separate row number.

Option 7

And finally, here’s a slightly more convoluted technique to return duplicate rows:

SELECT * 
FROM Dogs d1, Dogs d2 
WHERE d1.FirstName = d2.FirstName 
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId 
AND d1.DogId = (
    SELECT MAX(DogId) 
    FROM Dogs d3 
    WHERE d3.FirstName = d1.FirstName 
    AND d3.LastName = d1.LastName
);

Result:

+---------+-------------+------------+---------+-------------+------------+
| DogId   | FirstName   | LastName   | DogId   | FirstName   | LastName   |
|---------+-------------+------------+---------+-------------+------------|
| 2       | Bark        | Smith      | 1       | Bark        | Smith      |
| 7       | Wag         | Johnson    | 5       | Wag         | Johnson    |
| 7       | Wag         | Johnson    | 6       | Wag         | Johnson    |
+---------+-------------+------------+---------+-------------+------------+

Even the result looks more convoluted, but hey, it still shows us the duplicates!