The following examples return duplicate rows from an Oracle Database table.
Sample Data
Suppose we have a table with the following data:
SELECT * FROM Pets;
Result:
PetId PetName PetType ----- ------- ------- 1 Wag Dog 1 Wag Dog 2 Scratch Cat 3 Tweet Bird 4 Bark Dog 4 Bark Dog 4 Bark Dog
The first two rows are duplicates, as are the last three rows. In this case the duplicate rows contain duplicate values across all columns, including the ID column.
Option 1
We can use the following query to see how many rows are duplicates:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY PetId;
Result:
PETID PETNAME PETTYPE Count 1 Wag Dog 2 2 Scratch Cat 1 3 Tweet Bird 1 4 Bark Dog 3
We grouped the rows by all columns, and returned the row count of each group. Any row with a count greater than 1 is a duplicate.
We can order it by count in descending order, so that the rows with the most duplicates appear first:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
ORDER BY Count(*) DESC;
Result:
PETID PETNAME PETTYPE Count 4 Bark Dog 3 1 Wag Dog 2 2 Scratch Cat 1 3 Tweet Bird 1
Option 2
If we only want the duplicate rows listed, we can use the the HAVING
clause to return only rows with a count of greater than 1:
SELECT
PetId,
PetName,
PetType,
COUNT(*) AS "Count"
FROM Pets
GROUP BY
PetId,
PetName,
PetType
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
Result:
PETID PETNAME PETTYPE Count 4 Bark Dog 3 1 Wag Dog 2
Option 3
Another option is to use the ROW_NUMBER()
window function:
SELECT
PetId,
PetName,
PetType,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS rn
FROM Pets;
Result:
PETID PETNAME PETTYPE RN 1 Wag Dog 1 1 Wag Dog 2 2 Scratch Cat 1 3 Tweet Bird 1 4 Bark Dog 1 4 Bark Dog 2 4 Bark Dog 3
The PARTITION BY
clause divides the result set produced by the FROM
clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).
Option 4
We can use the above query as a common table expression:
WITH cte AS
(
SELECT
PetId,
PetName,
PetType,
ROW_NUMBER() OVER (
PARTITION BY PetId, PetName, PetType
ORDER BY PetId, PetName, PetType
) AS Row_Number
FROM Pets
)
SELECT * FROM cte WHERE Row_Number <> 1;
Result:
PETID PETNAME PETTYPE ROW_NUMBER 1 Wag Dog 2 4 Bark Dog 2 4 Bark Dog 3
This returns just the excess rows from the matching duplicates. So if there are two identical rows, it returns one of them. If there are three identical rows, it returns two, and so on.
Option 5
Given our table doesn’t contain a primary key column we can take advantage of Oracle’s rowid
pseudocolumn:
SELECT * FROM Pets
WHERE EXISTS (
SELECT 1 FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
AND Pets.rowid > p2.rowid
);
Result:
PETID PETNAME PETTYPE 1 Wag Dog 4 Bark Dog 4 Bark Dog
The way this works is, each row in an Oracle database has a rowid
pseudocolumn that returns the address of the row. The rowid
is a unique identifier for rows in the table, and usually its value uniquely identifies a row in the database. However, it’s important to note that rows in different tables that are stored together in the same cluster can have the same rowid
.
One benefit of the above example is that we can replace SELECT *
with DELETE
in order to de-dupe the table.
Option 6
And finally, here’a another option that uses the rowid
pseudocolumn:
SELECT * FROM Pets
WHERE rowid > (
SELECT MIN(rowid) FROM Pets p2
WHERE Pets.PetName = p2.PetName
AND Pets.PetType = p2.PetType
);
Result:
PETID PETNAME PETTYPE 1 Wag Dog 4 Bark Dog 4 Bark Dog
Same result as the previous example.
As with the previous example, we can replace SELECT *
with DELETE
in order to remove duplicate rows from the table.