Here are eleven options for returning duplicate rows in SQLite when those rows have a primary key or some other unique identifier column (but you want to ignore the primary key).
This means that the duplicate rows share exactly the same values across all columns with the exception of their primary key/unique ID column.
Sample Data
We’ll use the following data for our examples:
SELECT * FROM Dogs;
Result:
DogId FirstName LastName ----- --------- -------- 1 Bark Smith 2 Bark Smith 3 Woof Jones 4 Ruff Robinson 5 Wag Johnson 6 Wag Johnson 7 Wag Johnson
The first two rows are duplicates (except for the DogId
column, which is the table’s primary key, and contains a unique value across all rows). The last three rows are also duplicates (except for the DogId
column).
The primary key column ensures that there are no duplicate rows, which is good practice in RDBMSs, because primary keys help to enforce data integrity. But because primary keys prevent duplicate rows, they have the potential to interfere with our ability to find duplicates.
In our table above, the primary key column is an incrementing number, and its value carries no meaning and is not significant. We therefore need to ignore that row if we want to find duplicates in the other columns.
Option 1
We can run a query with the GROUP BY
clause in order to group the columns by their significant columns, then use the COUNT()
function to return the number of identical rows:
SELECT
FirstName,
LastName,
COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
ORDER BY Count DESC;
Result:
FirstName LastName Count --------- -------- ----- Wag Johnson 3 Bark Smith 2 Ruff Robinson 1 Woof Jones 1
Here we excluded the primary key column by omitting it from our query. We also ordered it by count in descending order, so that the duplicates appear first.
The result tells us that there are three rows containing Wag Johnson and two rows containing Bark Smith. These are duplicates (or triplicates in the case of Wag Johnson). The other two rows do not have any duplicates.
Option 2
We can use the HAVING
clause to exclude non-duplicates from the output:
SELECT
FirstName,
LastName,
COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
ORDER BY Count DESC;
Result:
FirstName LastName Count --------- -------- ----- Wag Johnson 3 Bark Smith 2
Option 3
Here’s an example of checking for duplicates on concatenated columns. In this case we use the DISTINCT
keyword to get distinct values, then use the COUNT()
function to return the count:
SELECT
DISTINCT FirstName || ' ' || LastName AS DogName,
COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName || ' ' || LastName
ORDER BY Count DESC;
Result:
DogName Count ------------- ----- Wag Johnson 3 Bark Smith 2 Woof Jones 1 Ruff Robinson 1
Option 4
By default, every row in SQLite has a special column, usually called the rowid
, that uniquely identifies that row within the table. Unless it has been explicitly removed from the table, you can use this as a unique identifier for each row.
We can therefore use the rowid
in our query:
SELECT * FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.rowid > d2.rowid
);
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 6 Wag Johnson 7 Wag Johnson
We could replace the SELECT *
with DELETE
to perform a de-duping operation on the table.
Note that we could have used the DogId
column (our primary key) instead of the rowid
if we wanted. That said, the rowid
can be useful if you can’t use the primary key column for some reason, or if the table doesn’t have a primary key.
Option 5
Here’s another query that uses the rowid
:
SELECT * FROM Dogs
WHERE rowid > (
SELECT MIN(rowid) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 6 Wag Johnson 7 Wag Johnson
As with the previous example, we could replace the SELECT *
with DELETE
to delete the duplicate rows.
Option 6
The two rowid
options above are great if you must completely ignore the primary key in your query (or if you don’t have a primary key column at all). However as mentioned, there is still the option of replacing rowid
with the primary key column – in our case the DogId
column:
SELECT * FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.DogId > d2.DogId
);
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 6 Wag Johnson 7 Wag Johnson
Option 7
And here’s the other query with the rowid
replaced by the DogId
column:
SELECT * FROM Dogs
WHERE DogId > (
SELECT MIN(DogId) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 6 Wag Johnson 7 Wag Johnson
Option 8
Another way to do it is to use the ROW_NUMBER()
window function:
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY FirstName, LastName
ORDER BY FirstName, LastName
) AS Row_Number
FROM Dogs;
Result:
DogId FirstName LastName Row_Number ----- --------- -------- ---------- 1 Bark Smith 1 2 Bark Smith 2 4 Ruff Robinson 1 5 Wag Johnson 1 6 Wag Johnson 2 7 Wag Johnson 3 3 Woof Jones 1
Using the PARTITION BY
clause results in a new column being added, with a row number that increments each time there’s a duplicate, but resets again when there’s a unique row.
In this case we don’t group the results, which means we can see each duplicate row, including its unique identifier column.
Option 9
We can also use the previous example as a common table expression in a larger query:
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY FirstName, LastName
ORDER BY FirstName, LastName
) AS Row_Number
FROM Dogs
)
SELECT * FROM cte WHERE Row_Number <> 1;
Result:
DogId FirstName LastName Row_Number ----- --------- -------- ---------- 2 Bark Smith 2 6 Wag Johnson 2 7 Wag Johnson 3
That excludes non-duplicates from the output, and it excludes one row of each duplicate from the output.
Option 10
Here’s another way to get the same output as the previous example:
SELECT * FROM Dogs
WHERE DogId IN (
SELECT DogId FROM Dogs
EXCEPT SELECT MIN(DogId) FROM Dogs
GROUP BY FirstName, LastName
);
Result:
DogId FirstName LastName ----- --------- -------- 2 Bark Smith 6 Wag Johnson 7 Wag Johnson
Option 11
Here’s yet another option to select duplicates from our table:
SELECT *
FROM Dogs d1, Dogs d2
WHERE d1.FirstName = d2.FirstName
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId
AND d1.DogId = (
SELECT MAX(DogId)
FROM Dogs d3
WHERE d3.FirstName = d1.FirstName
AND d3.LastName = d1.LastName
);
Result:
DogId FirstName LastName DogId FirstName LastName ----- --------- -------- ----- --------- -------- 2 Bark Smith 1 Bark Smith 7 Wag Johnson 5 Wag Johnson 7 Wag Johnson 6 Wag Johnson