4 Ways to Select Duplicate Rows in MariaDB

If you have a table that you suspect has duplicate rows in your MariaDB database, you can use any of the following queries to get an idea of how many rows are duplicates.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Pets;

Result:

+---------+-----------+-----------+
| PetId   | PetName   | PetType   |
|---------+-----------+-----------|
| 1       | Wag       | Dog       |
| 1       | Wag       | Dog       |
| 2       | Scratch   | Cat       |
| 3       | Tweet     | Bird      |
| 4       | Bark      | Dog       |
| 4       | Bark      | Dog       |
| 4       | Bark      | Dog       |
+---------+-----------+-----------+

The first two rows are duplicates, and the last three rows are duplicates.

Option 1

We can use the following query to see how many rows are duplicates:

SELECT 
    DISTINCT PetId, 
    COUNT(*) AS "Count"
FROM Pets
GROUP BY PetId
ORDER BY PetId;

Result:

+---------+---------+
| PetId   | Count   |
|---------+---------|
| 1       | 2       |
| 2       | 1       |
| 3       | 1       |
| 4       | 3       |
+---------+---------+

We can expand the SELECT list to include more columns if required:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY PetId;

Result:

+---------+-----------+-----------+---------+
| PetId   | PetName   | PetType   | Count   |
|---------+-----------+-----------+---------|
| 1       | Wag       | Dog       | 2       |
| 2       | Scratch   | Cat       | 1       |
| 3       | Tweet     | Bird      | 1       |
| 4       | Bark      | Dog       | 3       |
+---------+-----------+-----------+---------+

We can alternatively order it by count in descending order, so that the rows with the most duplicates appear first:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY Count DESC;

Result:

+-------+---------+---------+-------+
| PetId | PetName | PetType | Count |
+-------+---------+---------+-------+
|     4 | Bark    | Dog     |     3 |
|     1 | Wag     | Dog     |     2 |
|     3 | Tweet   | Bird    |     1 |
|     2 | Scratch | Cat     |     1 |
+-------+---------+---------+-------+

Option 2

If we only want the duplicate rows to be listed, we can use the the HAVING clause to exclude non-duplicate rows:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
HAVING COUNT(*) > 1
ORDER BY PetId;

Result:

+-------+---------+---------+-------+
| PetId | PetName | PetType | Count |
+-------+---------+---------+-------+
|     1 | Wag     | Dog     |     2 |
|     4 | Bark    | Dog     |     3 |
+-------+---------+---------+-------+

Option 3

Another way to do it is to use the ROW_NUMBER() function with the PARTITION BY clause to number the output of the result set.

SELECT 
    *, 
    ROW_NUMBER() OVER ( 
        PARTITION BY PetId, PetName, PetType 
        ORDER BY PetId, PetName, PetType
        ) AS Row_Number
FROM Pets;

Result:

+-------+---------+---------+------------+
| PetId | PetName | PetType | Row_Number |
+-------+---------+---------+------------+
|     1 | Wag     | Dog     |          1 |
|     1 | Wag     | Dog     |          2 |
|     2 | Scratch | Cat     |          1 |
|     3 | Tweet   | Bird    |          1 |
|     4 | Bark    | Dog     |          1 |
|     4 | Bark    | Dog     |          2 |
|     4 | Bark    | Dog     |          3 |
+-------+---------+---------+------------+

The PARTITION BY clause divides the result set produced by the FROM clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).

Option 4

To return just the surplus rows from the matching duplicates, we can use the above query as a common table expression, like this:

WITH cte AS 
    (
        SELECT 
            *, 
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS Row_Number
        FROM Pets
    )
SELECT * FROM cte WHERE Row_Number <> 1;

Result:

+-------+---------+---------+------------+
| PetId | PetName | PetType | Row_Number |
+-------+---------+---------+------------+
|     1 | Wag     | Dog     |          2 |
|     4 | Bark    | Dog     |          2 |
|     4 | Bark    | Dog     |          3 |
+-------+---------+---------+------------+