6 Ways to Select Duplicate Rows in SQLite

The following queries can be used to return duplicate rows in SQLite.

Here, the duplicate rows contain duplicate values across all columns, including the ID column.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Pets;

Result:

PetId  PetName  PetType
-----  -------  -------
1      Wag      Dog    
1      Wag      Dog    
2      Scratch  Cat    
3      Tweet    Bird   
4      Bark     Dog    
4      Bark     Dog    
4      Bark     Dog

The first two rows are duplicates, as are the last three rows. That’s because all three columns contain the same values in each duplicate row.

Option 1

We can use the following query to see how many rows are duplicates:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY PetId;

Result:

PetId  PetName  PetType  Count
-----  -------  -------  -----
1      Wag      Dog      2    
2      Scratch  Cat      1    
3      Tweet    Bird     1    
4      Bark     Dog      3

Here, we grouped the rows by all columns, and returned the row count of each group. This tells us whether a row is unique (with a count of 1) or a duplicate (with a count greater than 1).

We can order it by count in descending order, so that the rows with the most duplicates appear first:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY Count(*) DESC;

Result:

PetId  PetName  PetType  Count
-----  -------  -------  -----
4      Bark     Dog      3    
1      Wag      Dog      2    
2      Scratch  Cat      1    
3      Tweet    Bird     1

Option 2

If we only want the duplicate rows listed, we can use the the HAVING clause to return only rows with a count of greater than 1:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
HAVING COUNT(*) > 1
ORDER BY PetId;

Result:

PetId  PetName  PetType  Count
-----  -------  -------  -----
1      Wag      Dog      2    
4      Bark     Dog      3

Option 3

Another option is to use the ROW_NUMBER() window function:

SELECT 
    *, 
    ROW_NUMBER() OVER ( 
        PARTITION BY PetId, PetName, PetType 
        ORDER BY PetId, PetName, PetType
        ) AS Row_Number
FROM Pets;

Result:

PetId  PetName  PetType  Row_Number
-----  -------  -------  ----------
1      Wag      Dog      1         
1      Wag      Dog      2         
2      Scratch  Cat      1         
3      Tweet    Bird     1         
4      Bark     Dog      1         
4      Bark     Dog      2         
4      Bark     Dog      3

The PARTITION BY clause divides the result set produced by the FROM clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).

Option 4

We can use the above query as a common table expression:

WITH cte AS 
    (
        SELECT 
            *, 
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS Row_Number
        FROM Pets
    )
SELECT * FROM cte WHERE Row_Number <> 1;

Result:

PetId  PetName  PetType  Row_Number
-----  -------  -------  ----------
1      Wag      Dog      2         
4      Bark     Dog      2         
4      Bark     Dog      3

This returns just the excess rows from the matching duplicates. So if there are two identical rows, it returns one of them. If there are three identical rows, it returns two, and so on.

This query can be useful for showing how many rows will be removed from the table in a de-duping operation. In some other DBMSs (in SQL Server at least), we can replace the last SELECT * with DELETE to delete the duplicate rows from the table. But SQLite won’t let us update the CTE like that.

Fortunately, the next two options can be modified to perform a delete.

Option 5

We can take advantage of SQLite’s rowid:

SELECT * FROM Pets
WHERE EXISTS (
  SELECT 1 FROM Pets p2 
  WHERE Pets.PetName = p2.PetName
  AND Pets.PetType = p2.PetType
  AND Pets.rowid > p2.rowid
);

Result:

PetId  PetName  PetType
-----  -------  -------
1      Wag      Dog    
4      Bark     Dog    
4      Bark     Dog

How does this work? By default, every row in SQLite has a special column, usually called the rowid, that uniquely identifies that row within the table. This can be removed if required, but unless it has explicitly been removed, you will be able to leverage it within your queries.

Option 6

And finally, here’a another option that uses SQLite’s rowid:

SELECT * FROM Pets
WHERE rowid > (
  SELECT MIN(rowid) FROM Pets p2  
  WHERE Pets.PetName = p2.PetName
  AND Pets.PetType = p2.PetType
);

Result:

PetId  PetName  PetType
-----  -------  -------
1      Wag      Dog    
4      Bark     Dog    
4      Bark     Dog