When you’re working with real-world data, duplicates can sometimes sneak in more often than we’d like. Maybe an import process didn’t filter things properly, or users managed to submit the same information twice. Whatever the reason, finding and handling duplicates is a common task.
A neat way to tackle this problem in SQL Server is by using window functions. Specifically, we can use ROW_NUMBER() or RANK() to find duplicate rows in a table. These functions let you assign a sequence number to rows within a group, which makes it easy to spot duplicates.
Sample Data
Let’s create a sample table with duplicate data. Imagine you’re storing customer orders, but due to a bug in the ordering system, some rows got duplicated:
CREATE TABLE Orders (
OrderID INT IDENTITY PRIMARY KEY,
CustomerName NVARCHAR(100),
Product NVARCHAR(100),
OrderDate DATE
);
INSERT INTO Orders (CustomerName, Product, OrderDate) VALUES
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'),
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'), -- duplicate
('Hiro Tanaka', 'NoiseCancel Headphones', '2025-09-02'),
('Edda Sørensen', 'Smartwatch Series 7', '2025-09-03'),
('Hiro Tanaka', 'NoiseCancel Headphones', '2025-09-02'), -- duplicate
('Kwame Boateng', 'Mechanical Keyboard X', '2025-09-04'),
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'); -- duplicate
This table has three duplicates:
- Marisol’s laptop order appears three times.
- Hiro’s headphone order appears twice.
Finding Duplicates with ROW_NUMBER()
The ROW_NUMBER() function is perfect for identifying duplicates because it assigns a sequential number to each row within a partition (group).
Here’s how you can use it:
WITH DuplicateCheck AS (
SELECT
OrderID,
CustomerName,
Product,
OrderDate,
ROW_NUMBER() OVER (
PARTITION BY CustomerName, Product, OrderDate
ORDER BY OrderID
) AS RowNum
FROM Orders
)
SELECT *
FROM DuplicateCheck
WHERE RowNum > 1;
Output:
OrderID CustomerName Product OrderDate RowNum
------- ------------ ---------------------- ------------------------ ------
5 Hiro Tanaka NoiseCancel Headphones 2025-09-02T00:00:00.000Z 2
2 Marisol Vega Laptop Pro 15 2025-09-01T00:00:00.000Z 2
7 Marisol Vega Laptop Pro 15 2025-09-01T00:00:00.000Z 3
This query returns all duplicate rows. These are the rows that can be deleted (if your plan is to delete them). It doesn’t return the “original”. The original will still exist after these duplicates have been deleted.
Let’s slowly walk through the above query:
PARTITION BY: This defines what counts as a “duplicate.” In our case, it’s rows with the sameCustomerName,Product, andOrderDate.ORDER BY: Determines which row getsRowNum = 1. The first row (byOrderID) is treated as the “original,” and any others getRowNum = 2, 3...(which make them the duplicates).- The outer query filters for rows where
RowNum > 1, which are the duplicates. For example, if theRowNumis2, then we know it’s a duplicate.
Next Steps
Once you’ve identified the duplicates, you can:
- Delete them (keeping only the
RowNum = 1rows). - Export them for auditing.
- Or just report on them.
For example, deleting duplicates while keeping the first record is straightforward – we can simply switch the outer SELECT statement to a DELETE statement:
WITH DuplicateCheck AS (
SELECT
OrderID,
ROW_NUMBER() OVER (
PARTITION BY CustomerName, Product, OrderDate
ORDER BY OrderID
) AS RowNum
FROM Orders
)
DELETE FROM DuplicateCheck
WHERE RowNum > 1;
Output:
(3 rows affected)
And that’s it. With a simple window function, you’ve got a flexible and efficient way to detect and manage duplicates in SQL Server.