3 Ways to Remove Duplicate Rows from Query Results in SQL

Sometimes when we run a SQL query, we only want to see unique rows. But depending on the columns we’re selecting, we might end up with duplicate rows. And this could happen without even knowing it, especially with large data sets.

But it doesn’t have to be this way.

Fortunately most SQL databases provide us with an easy way to remove duplicates.

The `DISTINCT` Clause

The most common way to remove duplicate rows from our query results is to use the DISTINCT clause. The simplest way to use this is with the DISTINCT keyword at the start of the SELECT list.

Suppose we have a table like this:

SELECT * FROM Dogs;

Result:

+-------+-----------+----------+-----+
| DogId | FirstName | LastName | Age |
+-------+-----------+----------+-----+
|     1 | Bark      | Smith    |  10 |
|     2 | Wag       | Frazier  |   7 |
|     3 | Bark      | Jones    |  10 |
+-------+-----------+----------+-----+

This table contains two dogs named Bark, and coincidentally, they are both aged 10.

Now let’s run a query that selects just two columns from that table:

SELECT FirstName, Age
FROM Dogs
ORDER BY FirstName;

Result:

+-----------+-----+
| FirstName | Age |
+-----------+-----+
| Bark      |  10 |
| Bark      |  10 |
| Wag       |   7 |
+-----------+-----+

Our query has resulted in duplicate rows; the first two rows have the same values in both columns (Bark, aged 10).

Having duplicate rows isn’t necessarily a bad thing. However, if we don’t want them, we can use the DISTINCT clause to return just unique rows. In other words, it will remove any redundant duplicate rows, and return just one row, no matter how many duplicates there are.

Here’s how to use the DISTINCT clause to return unique rows:

SELECT DISTINCT FirstName, Age
FROM Dogs
ORDER BY FirstName;

Result:

+-----------+-----+
| FirstName | Age |
+-----------+-----+
| Bark      |  10 |
| Wag       |   7 |
+-----------+-----+

So, all we did was prefix the SELECT list with the DISTINCT keyword. Now only unique rows are returned by our query (we only get one row that contains Bark aged 10).

The `GROUP BY` Clause

Another method we can use to eliminate duplicates is to use the GROUP BY clause. The GROUP BY clause enables us to group our results by a given column or columns.

Example:

SELECT FirstName, Age
FROM Dogs
GROUP BY FirstName, Age
ORDER BY FirstName, Age;

Result:

+-----------+-----+
| FirstName | Age |
+-----------+-----+
| Bark      |  10 |
| Wag       |   7 |
+-----------+-----+

Same result as the previous example. The GROUP BY clause had the effect of removing the redundant duplicate row from our query results.

One benefit of using this method is that we can incorporate the COUNT() function into our SELECT list to return the number of duplicate rows, if any:

SELECT FirstName, Age, COUNT(*)
FROM Dogs
GROUP BY FirstName, Age
ORDER BY FirstName, Age;

Result:

+-----------+-----+----------+
| FirstName | Age | COUNT(*) |
+-----------+-----+----------+
| Bark      |  10 |        2 |
| Wag       |   7 |        1 |
+-----------+-----+----------+

The `DISTINCT ON()` Option

At least one RDBMS (PostgreSQL) allows us to use an ON option with our DISTINCT clause. The ON option allows us to specify which rows to evaluate when removing duplicates. It keeps only the first row of each set of rows where the given expressions evaluate to equal.

Here’s an example:

SELECT DISTINCT ON(FirstName) FirstName, LastName, Age
FROM Dogs
ORDER BY FirstName;

Result:

 firstname | lastname | age 
-----------+----------+-----
 Bark      | Smith    |  10
 Wag       | Frazier  |   7

In this example, we used ON(FirstName) to dedupe rows that have the same value in the FirstName column. This allowed us to return columns that don’t necessarily contain duplicate values. Specifically, the we could include the LastName column in our query, even though that column contains unique values, while still deduping the rows based on the FirstName column.

To illustrate what I mean, let’s remove ON(FirstName) from the query:

SELECT DISTINCT FirstName, LastName, Age
FROM Dogs
ORDER BY FirstName;

Result:

 firstname | lastname | age 
-----------+----------+-----
 Bark      | Smith    |  10
 Bark      | Jones    |  10
 Wag       | Frazier  |   7

This time we got all three rows, because they are all unique when all columns are taken into account. When using this syntax, the DISTINCT clause takes into account all three rows.

The DISTINCT Clause

The GROUP BY Clause

The DISTINCT ON() Option

The `DISTINCT` Clause

The `GROUP BY` Clause

The `DISTINCT ON()` Option