4 Ways to Check for Duplicate Rows in SQL Server

Here are four methods you can use to find duplicate rows in SQL Server.

By “duplicate rows” I mean two or more rows that share exactly the same values across all columns.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Pets;

Result:

+---------+-----------+-----------+
| PetId   | PetName   | PetType   |
|---------+-----------+-----------|
| 1       | Wag       | Dog       |
| 1       | Wag       | Dog       |
| 2       | Scratch   | Cat       |
| 3       | Tweet     | Bird      |
| 4       | Bark      | Dog       |
| 4       | Bark      | Dog       |
| 4       | Bark      | Dog       |
+---------+-----------+-----------+

We can see that the first two rows are duplicates, as are the last three rows.

Option 1

We can use the following query to return information about duplicate rows:

SELECT 
    DISTINCT PetId, 
    COUNT(*) AS "Count"
FROM Pets
GROUP BY PetId
ORDER BY PetId;

Result:

+---------+---------+
| PetId   | Count   |
|---------+---------|
| 1       | 2       |
| 2       | 1       |
| 3       | 1       |
| 4       | 3       |
+---------+---------+

We can expand the SELECT list to include more columns if required:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY PetId;

Result:

+---------+-----------+-----------+---------+
| PetId   | PetName   | PetType   | Count   |
|---------+-----------+-----------+---------|
| 1       | Wag       | Dog       | 2       |
| 2       | Scratch   | Cat       | 1       |
| 3       | Tweet     | Bird      | 1       |
| 4       | Bark      | Dog       | 3       |
+---------+-----------+-----------+---------+

If the table has a unique identifier, we can simply remove that column from the query. For example, if we assume that the PetId column is actually a primary key column that contains a unique ID, we could run the following query to return all rows that are duplicates, not counting the primary key column:

SELECT 
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetName,
    PetType
ORDER BY PetName;

Result:

+-----------+-----------+---------+
| PetName   | PetType   | Count   |
|-----------+-----------+---------|
| Bark      | Dog       | 3       |
| Scratch   | Cat       | 1       |
| Tweet     | Bird      | 1       |
| Wag       | Dog       | 2       |
+-----------+-----------+---------+

Option 2

If we only want the actual duplicate rows returned, we can add the HAVING clause:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
HAVING COUNT(*) > 1
ORDER BY PetId;

Result:

+---------+-----------+-----------+---------+
| PetId   | PetName   | PetType   | Count   |
|---------+-----------+-----------+---------|
| 1       | Wag       | Dog       | 2       |
| 4       | Bark      | Dog       | 3       |
+---------+-----------+-----------+---------+

Option 3

Another way to do it is to use the ROW_NUMBER() function with the PARTITION BY clause to number the output of the result set.

SELECT 
    *, 
    ROW_NUMBER() OVER ( 
        PARTITION BY PetId, PetName, PetType 
        ORDER BY PetId, PetName, PetType
        ) AS Row_Number
FROM Pets;

Result:

+---------+-----------+-----------+--------------+
| PetId   | PetName   | PetType   | Row_Number   |
|---------+-----------+-----------+--------------|
| 1       | Wag       | Dog       | 1            |
| 1       | Wag       | Dog       | 2            |
| 2       | Scratch   | Cat       | 1            |
| 3       | Tweet     | Bird      | 1            |
| 4       | Bark      | Dog       | 1            |
| 4       | Bark      | Dog       | 2            |
| 4       | Bark      | Dog       | 3            |
+---------+-----------+-----------+--------------+

The PARTITION BY clause divides the result set produced by the FROM clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).

Option 4

If we only want surplus rows from the matching duplicates to be returned, we can use the above query as a common table expression, like this:

WITH CTE AS 
    (
        SELECT 
            *, 
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS Row_Number
        FROM Pets
    )
SELECT * FROM CTE WHERE Row_Number <> 1;

Result:

+---------+-----------+-----------+--------------+
| PetId   | PetName   | PetType   | Row_Number   |
|---------+-----------+-----------+--------------|
| 1       | Wag       | Dog       | 2            |
| 4       | Bark      | Dog       | 2            |
| 4       | Bark      | Dog       | 3            |
+---------+-----------+-----------+--------------+

One of the benefits of doing this is that we can delete duplicate rows simply by switching SELECT * to DELETE (on the last line).

Therefore we can use the above code to see which rows will be deleted, and then when we’re satisfied that we’re going to delete the correct rows, we can switch it to a DELETE statement to actually delete them.

Like this:

WITH CTE AS 
    (
        SELECT 
            *, 
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS Row_Number
        FROM Pets
    )
DELETE FROM CTE WHERE Row_Number <> 1;