11 Ways to Find Duplicate Rows that have a Primary Key in Oracle

Here are eleven options for returning duplicate rows in Oracle Database when those rows have a primary key or some other unique identifier column and you want to ignore it.

Sample Data

We’ll use the following data for our examples:

SELECT * FROM Dogs;

Result:

DOGID	FIRSTNAME	LASTNAME
1	Bark	Smith
2	Bark	Smith
3	Woof	Jones
4	Ruff	Robinson
5	Wag	Johnson
6	Wag	Johnson
7	Wag	Johnson

The first two rows are duplicates and the last three rows are duplicates. The duplicate rows share exactly the same values across all columns with the exception of their primary key/unique ID column.

The primary key column ensures that there are no duplicate rows, which is good practice in RDBMSs, because primary keys help to enforce data integrity. But the fact that primary keys contain unique values means that we need to ignore that column when searching for duplicates.

In our table above, the primary key column is an incrementing number, and its value carries no meaning and is not significant. We can therefore ignore that column’s data when searching for duplicates.

Option 1

Here’s our first option for returning duplicates:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
ORDER BY Count DESC;

Result:

FIRSTNAME	LASTNAME	COUNT
Wag	Johnson	3
Bark	Smith	2
Ruff	Robinson	1
Woof	Jones	1

Here we constructed our query with the GROUP BY clause so that the output is grouped by the relevant columns. We also used the COUNT() function to return the number of identical rows. And we ordered it by count in descending order so that the duplicates appear first.

The result tells us that there are three rows containing Wag Johnson and two rows containing Bark Smith. These are duplicates (or triplicates in the case of Wag Johnson). The other two rows do not have any duplicates.

Option 2

We can add the HAVING clause to our previous example to exclude non-duplicates from the output:

SELECT 
    FirstName, 
    LastName, 
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
ORDER BY Count DESC;

Result:

FIRSTNAME	LASTNAME	COUNT
Wag	Johnson	3
Bark	Smith	2

Option 3

We can also check for duplicates on concatenated columns. In this case we use the DISTINCT keyword to get distinct values, then use the COUNT() function to return the count:

SELECT
    DISTINCT FirstName || ' ' || LastName AS DogName,
    COUNT(*) AS Count
FROM Dogs
GROUP BY FirstName || ' ' || LastName
ORDER BY Count DESC;

Result:

DOGNAME	COUNT
Wag Johnson	3
Bark Smith	2
Ruff Robinson	1
Woof Jones	1

Option 4

Each row in Oracle has a rowid pseudocolumn that returns the address of the row. The rowid is a unique identifier for rows in the table, and usually its value uniquely identifies a row in the database (although it’s important to note that rows in different tables that are stored together in the same cluster can have the same rowid).

Anyway, we can construct a query that uses the rowid if we want:

SELECT * FROM Dogs
WHERE EXISTS (
  SELECT 1 FROM Dogs d2 
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
  AND Dogs.rowid > d2.rowid
);

Result:

DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith
6	Wag	Johnson
7	Wag	Johnson

We could replace the SELECT * with DELETE to perform a de-duping operation on the table.

Note that we could have used the DogId column (our primary key) instead of the rowid if we wanted. That said, the rowid can be useful if you can’t use the primary key column for some reason, or if the table doesn’t have a primary key.

Option 5

Here’s another query that uses the rowid:

SELECT * FROM Dogs
WHERE rowid > (
  SELECT MIN(rowid) FROM Dogs d2  
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
);

Result:

DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith
6	Wag	Johnson
7	Wag	Johnson

As with the previous example, we could replace the SELECT * with DELETE to delete the duplicate rows.

Option 6

The two rowid options above are great if you must completely ignore the primary key in your query (or if you don’t have a primary key column at all). However as mentioned, there is still the option of replacing rowid with the primary key column – in our case the DogId column:

SELECT * FROM Dogs
WHERE EXISTS (
  SELECT 1 FROM Dogs d2 
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
  AND Dogs.DogId > d2.DogId
);

Result:

DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith
6	Wag	Johnson
7	Wag	Johnson

Option 7

And here’s the other query with the rowid replaced by the DogId column:

SELECT * FROM Dogs
WHERE DogId > (
  SELECT MIN(DogId) FROM Dogs d2  
  WHERE Dogs.FirstName = d2.FirstName
  AND Dogs.LastName = d2.LastName
);

Result:

DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith
6	Wag	Johnson
7	Wag	Johnson

Option 8

Another way to find duplicates is to use the ROW_NUMBER() window function:

SELECT 
    DogId,
    FirstName,
    LastName,
    ROW_NUMBER() OVER ( 
        PARTITION BY FirstName, LastName 
        ORDER BY FirstName, LastName
        ) AS row_num
FROM Dogs;

Result:

DOGID	FIRSTNAME	LASTNAME	ROW_NUM
1	Bark	Smith	1
2	Bark	Smith	2
4	Ruff	Robinson	1
7	Wag	Johnson	1
5	Wag	Johnson	2
6	Wag	Johnson	3
3	Woof	Jones	1

Using the PARTITION BY clause results in a new column being added, with a row number that increments each time there’s a duplicate, but resets again when there’s a unique row.

In this case we don’t group the results, which means we can see each duplicate row, including its unique identifier column.

Option 9

We can also use the previous example as a common table expression in a larger query:

WITH cte AS 
    (
        SELECT 
            DogId,
            FirstName,
            LastName,
            ROW_NUMBER() OVER ( 
                PARTITION BY FirstName, LastName 
                ORDER BY FirstName, LastName
                ) AS row_num
        FROM Dogs
    )
SELECT * FROM cte WHERE row_num <> 1;

Result:

DOGID	FIRSTNAME	LASTNAME	ROW_NUM
2	Bark	Smith	2
5	Wag	Johnson	2
6	Wag	Johnson	3

That query excludes non-duplicates from the output, and it excludes one row of each duplicate from the output.

Option 10

Here’s another way to get the same output as the previous example:

SELECT * FROM Dogs 
WHERE DogId IN (
    SELECT DogId FROM Dogs 
    MINUS SELECT MIN(DogId) FROM Dogs 
    GROUP BY FirstName, LastName
    );

Result:

DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith
6	Wag	Johnson
7	Wag	Johnson

This example uses Oracle’s MINUS operator, which returns only unique rows returned by the first query but not by the second.

The MINUS operator is similar to the EXCEPT operator in other DBMSs, such as SQL Server, MariaDB, PostgreSQL, and SQLite.

Option 11

Here’s yet another option to select duplicates from our table:

SELECT * 
FROM Dogs d1, Dogs d2 
WHERE d1.FirstName = d2.FirstName 
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId 
AND d1.DogId = (
    SELECT MAX(DogId) 
    FROM Dogs d3 
    WHERE d3.FirstName = d1.FirstName 
    AND d3.LastName = d1.LastName
);

Result:

DOGID	FIRSTNAME	LASTNAME	DOGID	FIRSTNAME	LASTNAME
2	Bark	Smith	1	Bark	Smith
7	Wag	Johnson	5	Wag	Johnson
7	Wag	Johnson	6	Wag	Johnson