Here are some options for deleting duplicate rows from a table in Oracle Database when those rows have a primary key or unique identifier column.
In such cases, the primary key must be ignored when comparing duplicate rows (due to the fact that primary keys hold unique values).
Sample Data
Our examples use the following data:
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
2 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |
6 | Wag | Johnson |
7 | Wag | Johnson |
We can see that the first two rows are duplicates, as are the last three rows.
The DogId
column holds unique values (because it’s the table’s primary key), but we’re ignoring that column when comparing duplicates. You may often find yourself needing to de-dupe tables that contain primary keys, and so the following examples can be used to do just that.
Option 1
Here’s our first option for de-duping the above table:
DELETE FROM Dogs
WHERE DogId IN (
SELECT DogId FROM Dogs
MINUS SELECT MIN(DogId) FROM Dogs
GROUP BY FirstName, LastName
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |
The duplicates have been removed (but one row of each duplicate remains).
We can alternatively use the MAX()
function instead of the MIN()
function to change which rows are deleted.
Option 2
In this example (and the following examples) we’ll assume that the table has been restored to its original state (with the duplicates).
Here’s another example that de-dupes the table and then selects the remaining rows:
DELETE FROM Dogs WHERE DogId IN (
SELECT d2.DogId
FROM Dogs d1, Dogs d2
WHERE d1.FirstName = d2.FirstName
AND d1.LastName = d2.LastName
AND d1.DogId <> d2.DogId
AND d1.DogId=(
SELECT MAX(DogId)
FROM Dogs d3
WHERE d3.FirstName = d1.FirstName
AND d3.LastName = d1.LastName
)
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
2 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
7 | Wag | Johnson |
Notice that I used the MAX()
function instead of MIN()
which I used in the previous example. We can see the affect this has on the de-duping operation. It deleted different rows from the table.
Option 3
Here’s an option that doesn’t require the use of MIN()
or MAX()
:
DELETE FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.DogId > d2.DogId
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |
Option 4
Here’s another option:
DELETE FROM Dogs
WHERE DogId > (
SELECT MIN(DogId) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |
Option 5
Each row in Oracle has a rowid
pseudocolumn that returns the address of the row. The rowid
is a unique identifier for rows in the table, and usually its value uniquely identifies a row in the database (although it’s important to note that rows in different tables that are stored together in the same cluster can have the same rowid
).
We can therefore use the rowid
in our query instead of the DogId
column:
DELETE FROM Dogs
WHERE EXISTS (
SELECT 1 FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
AND Dogs.rowid > d2.rowid
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |
While this example may seem slightly superfluous, given we already have a primary key column, there may be instances where you prefer to use the rowid
. The rowid
can be useful if you can’t use the primary key column for some reason, or if the table doesn’t have a primary key. Also, Oracle’s documentation mentions that rowid
values are the fastest way to access a single row.
Option 6
And here’s the other example, but with rowid
instead of the primary key:
DELETE FROM Dogs
WHERE rowid > (
SELECT MIN(rowid) FROM Dogs d2
WHERE Dogs.FirstName = d2.FirstName
AND Dogs.LastName = d2.LastName
);
SELECT * FROM Dogs;
Result:
DOGID | FIRSTNAME | LASTNAME |
---|---|---|
1 | Bark | Smith |
3 | Woof | Jones |
4 | Ruff | Robinson |
5 | Wag | Johnson |