How to Use Window Functions to Find Duplicates in SQL Server

When you’re working with real-world data, duplicates can sometimes sneak in more often than we’d like. Maybe an import process didn’t filter things properly, or users managed to submit the same information twice. Whatever the reason, finding and handling duplicates is a common task.

A neat way to tackle this problem in SQL Server is by using window functions. Specifically, we can use ROW_NUMBER() or RANK() to find duplicate rows in a table. These functions let you assign a sequence number to rows within a group, which makes it easy to spot duplicates.

Sample Data

Let’s create a sample table with duplicate data. Imagine you’re storing customer orders, but due to a bug in the ordering system, some rows got duplicated:

CREATE TABLE Orders (
    OrderID INT IDENTITY PRIMARY KEY,
    CustomerName NVARCHAR(100),
    Product NVARCHAR(100),
    OrderDate DATE
);

INSERT INTO Orders (CustomerName, Product, OrderDate) VALUES
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'),
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'), -- duplicate
('Hiro Tanaka', 'NoiseCancel Headphones', '2025-09-02'),
('Edda Sørensen', 'Smartwatch Series 7', '2025-09-03'),
('Hiro Tanaka', 'NoiseCancel Headphones', '2025-09-02'), -- duplicate
('Kwame Boateng', 'Mechanical Keyboard X', '2025-09-04'),
('Marisol Vega', 'Laptop Pro 15', '2025-09-01'); -- duplicate

This table has three duplicates:

Marisol’s laptop order appears three times.
Hiro’s headphone order appears twice.

Finding Duplicates with `ROW_NUMBER()`

The ROW_NUMBER() function is perfect for identifying duplicates because it assigns a sequential number to each row within a partition (group).

Here’s how you can use it:

WITH DuplicateCheck AS (
    SELECT 
        OrderID,
        CustomerName,
        Product,
        OrderDate,
        ROW_NUMBER() OVER (
            PARTITION BY CustomerName, Product, OrderDate
            ORDER BY OrderID
        ) AS RowNum
    FROM Orders
)
SELECT *
FROM DuplicateCheck
WHERE RowNum > 1;

Output:

OrderID  CustomerName  Product                 OrderDate                 RowNum
-------  ------------  ----------------------  ------------------------  ------
5        Hiro Tanaka   NoiseCancel Headphones  2025-09-02T00:00:00.000Z  2     
2        Marisol Vega  Laptop Pro 15           2025-09-01T00:00:00.000Z  2     
7        Marisol Vega  Laptop Pro 15           2025-09-01T00:00:00.000Z  3

This query returns all duplicate rows. These are the rows that can be deleted (if your plan is to delete them). It doesn’t return the “original”. The original will still exist after these duplicates have been deleted.

Let’s slowly walk through the above query:

PARTITION BY: This defines what counts as a “duplicate.” In our case, it’s rows with the same CustomerName, Product, and OrderDate.
ORDER BY: Determines which row gets RowNum = 1. The first row (by OrderID) is treated as the “original,” and any others get RowNum = 2, 3... (which make them the duplicates).
The outer query filters for rows where RowNum > 1, which are the duplicates. For example, if the RowNum is 2, then we know it’s a duplicate.

Next Steps

Once you’ve identified the duplicates, you can:

Delete them (keeping only the RowNum = 1 rows).
Export them for auditing.
Or just report on them.

For example, deleting duplicates while keeping the first record is straightforward – we can simply switch the outer SELECT statement to a DELETE statement:

WITH DuplicateCheck AS (
    SELECT 
        OrderID,
        ROW_NUMBER() OVER (
            PARTITION BY CustomerName, Product, OrderDate
            ORDER BY OrderID
        ) AS RowNum
    FROM Orders
)
DELETE FROM DuplicateCheck
WHERE RowNum > 1;

Output:

(3 rows affected)

And that’s it. With a simple window function, you’ve got a flexible and efficient way to detect and manage duplicates in SQL Server.

Sample Data

Finding Duplicates with ROW_NUMBER()

Next Steps

Finding Duplicates with `ROW_NUMBER()`