Here are four methods you can use to find duplicate rows in SQL Server.
By “duplicate rows” I mean two or more rows that share exactly the same values across all columns.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
+---------+-----------+-----------+ | PetId | PetName | PetType | |---------+-----------+-----------| | 1 | Wag | Dog | | 1 | Wag | Dog | | 2 | Scratch | Cat | | 3 | Tweet | Bird | | 4 | Bark | Dog | | 4 | Bark | Dog | | 4 | Bark | Dog | +---------+-----------+-----------+
We can see that the first two rows are duplicates, as are the last three rows.
Option 1
We can use the following query to return information about duplicate rows:
SELECT
DISTINCT PetId,
COUNT(*) AS "Count"
FROM Pets
GROUP BY PetId
ORDER BY PetId;
Result:
+---------+---------+ | PetId | Count | |---------+---------| | 1 | 2 | | 2 | 1 | | 3 | 1 | | 4 | 3 | +---------+---------+
We can expand the SELECT
list to include more columns if required:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY PetId;
Result:
+---------+-----------+-----------+---------+ | PetId | PetName | PetType | Count | |---------+-----------+-----------+---------| | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+---------+
If the table has a unique identifier, we can simply remove that column from the query. For example, if we assume that the PetId
column is actually a primary key column that contains a unique ID, we could run the following query to return all rows that are duplicates, not counting the primary key column:
SELECT
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetName,
PetType
ORDER BY PetName;
Result:
+-----------+-----------+---------+ | PetName | PetType | Count | |-----------+-----------+---------| | Bark | Dog | 3 | | Scratch | Cat | 1 | | Tweet | Bird | 1 | | Wag | Dog | 2 | +-----------+-----------+---------+
Option 2
If we only want the actual duplicate rows returned, we can add the HAVING
clause:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
HAVING COUNT(*) > 1
ORDER BY PetId;
Result:
+---------+-----------+-----------+---------+ | PetId | PetName | PetType | Count | |---------+-----------+-----------+---------| | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+---------+
Option 3
Another way to do it is to use the ROW_NUMBER()
function with the PARTITION BY
clause to number the output of the result set.
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS Row_Number
FROM Pets;
Result:
+---------+-----------+-----------+--------------+ | PetId | PetName | PetType | Row_Number | |---------+-----------+-----------+--------------| | 1 | Wag | Dog | 1 | | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 1 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+--------------+
The PARTITION BY
clause divides the result set produced by the FROM
clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).
Option 4
If we only want surplus rows from the matching duplicates to be returned, we can use the above query as a common table expression, like this:
WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS Row_Number
FROM Pets
)
SELECT * FROM CTE WHERE Row_Number <> 1;
Result:
+---------+-----------+-----------+--------------+ | PetId | PetName | PetType | Row_Number | |---------+-----------+-----------+--------------| | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +---------+-----------+-----------+--------------+
One of the benefits of doing this is that we can delete duplicate rows simply by switching SELECT *
to DELETE
(on the last line).
Therefore we can use the above code to see which rows will be deleted, and then when we’re satisfied that we’re going to delete the correct rows, we can switch it to a DELETE
statement to actually delete them.
Like this:
WITH CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS Row_Number
FROM Pets
)
DELETE FROM CTE WHERE Row_Number <> 1;