The following options can be used to delete duplicate rows in Oracle Database.
These examples delete duplicate rows but keep one. So if there are three identical rows for example, it deletes two of them and keeps one. This is often referred to as de-duping the table.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
PETID PETNAME PETTYPE 1 Wag Dog 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog 4 Bark Dog 4 Bark Dog
We can see that the first two rows are duplicates, as are the last three rows.
In this case, all columns are duplicates. There is no primary key column. Normally, one might expect the PetId
column to be a primary key, but it contains duplicate values and therefore cannot be a primary key.
If it was a primary key, it would contain unique values across all rows, and there would be no duplicates.
Regardless, below are two options for finding and deleting duplicate rows even when there’s no primary key.
Option 1
Here’s one option for removing duplicate rows from the above table:
DELETE FROM Pets
WHERE EXISTS (
SELECT 1 FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
AND Pets.rowid > p2.rowid
);
SELECT * FROM Pets;
Result:
3 row(s) deleted. PETID PETNAME PETTYPE 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog
The table has now been de-duped. Three rows were deleted, and four remain.
One of the duplicate rows for our dog “Wag” has been deleted and the other one remains. Two of the duplicate rows for “Bark” have also been deleted.
Oracle’s rowid
pseudocolumn allowed us to perform this de-dupe operation. We were able to reference it in our query to determine which rows to delete.
The way this works is, each row in an Oracle database has a rowid
pseudocolumn that returns the address of the row. The rowid
is a unique identifier for rows in the table, and usually its value uniquely identifies a row in the database. Therefore, we can identify each row even when we don’t have a primary key or some other unique ID field.
However, it’s important to note that rows in different tables that are stored together in the same cluster can have the same rowid
.
Option 2
Assuming the table has been restored with its original data (including duplicate rows), here’s another option for deleting duplicate rows.
DELETE FROM Pets
WHERE rowid > (
SELECT MIN(rowid) FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
);
SELECT * FROM Pets;
Result:
PETID PETNAME PETTYPE 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog
Same result as the previous example.