If you think a MySQL table might have duplicate rows, you can use the following options to return all duplicates.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
+-------+---------+---------+ | PetId | PetName | PetType | +-------+---------+---------+ | 1 | Wag | Dog | | 1 | Wag | Dog | | 2 | Scratch | Cat | | 3 | Tweet | Bird | | 4 | Bark | Dog | | 4 | Bark | Dog | | 4 | Bark | Dog | +-------+---------+---------+
The first two rows are duplicates, as are the last three rows. The duplicate rows share the same values across all columns.
Option 1
One option is to use the following query to return duplicate rows:
SELECT
DISTINCT PetId,
COUNT(*) AS "Count"
FROM Pets
GROUP BY PetId
ORDER BY PetId;
Result:
+-------+-------+ | PetId | Count | +-------+-------+ | 1 | 2 | | 2 | 1 | | 3 | 1 | | 4 | 3 | +-------+-------+
We can expand the SELECT
list to include more columns if required:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY PetId;
Result:
+-------+---------+---------+-------+ | PetId | PetName | PetType | Count | +-------+---------+---------+-------+ | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 3 | +-------+---------+---------+-------+
We can have the duplicates appear first by ordering it by count in descending order:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY Count DESC;
Result:
+-------+---------+---------+-------+ | PetId | PetName | PetType | Count | +-------+---------+---------+-------+ | 4 | Bark | Dog | 3 | | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | +-------+---------+---------+-------+
Option 2
If we only want to list the duplicate rows, we can use the the HAVING
clause to exclude non-duplicates from the output:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
HAVING COUNT(*) > 1
ORDER BY PetId;
Result:
+-------+---------+---------+-------+ | PetId | PetName | PetType | Count | +-------+---------+---------+-------+ | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 3 | +-------+---------+---------+-------+
Option 3
Another way to do it is to use the ROW_NUMBER()
function with the PARTITION BY
clause to number the output of the result set.
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS rn
FROM Pets;
Result:
+-------+---------+---------+----+ | PetId | PetName | PetType | rn | +-------+---------+---------+----+ | 1 | Wag | Dog | 1 | | 1 | Wag | Dog | 2 | | 2 | Scratch | Cat | 1 | | 3 | Tweet | Bird | 1 | | 4 | Bark | Dog | 1 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +-------+---------+---------+----+
The PARTITION BY
clause divides the result set produced by the FROM
clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).
Option 4
To return just the surplus rows from the matching duplicates, we can use the above query as a common table expression, like this:
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS rn
FROM Pets
)
SELECT * FROM cte WHERE rn <> 1;
Result:
+-------+---------+---------+----+ | PetId | PetName | PetType | rn | +-------+---------+---------+----+ | 1 | Wag | Dog | 2 | | 4 | Bark | Dog | 2 | | 4 | Bark | Dog | 3 | +-------+---------+---------+----+