How to Use the DISTINCT Clause in SQL

Most of the major relational database management systems (RDBMSs) allow for a DISTINCT clause to be included in our SQL queries.

We use the DISTINCT keyword to return only unique rows. It eliminates duplicates from the results. If we have two or more rows with exactly the same data, we’ll only see one row in the results.

The DISTINCT keyword is typically used in the SELECT list of a query, but some RDBMSs allow for it to be placed in other parts of the query, such as the GROUP BY clause and the WINDOW clause.

Below are some basic examples that demonstrate how the SQL DISTINCT clause works.

Sample Data

The examples on this page use the following data:

SELECT 
    VendorId,
    ProductName,
    ProductPrice
FROM Products
ORDER BY VendorId;

Result:

+----------+---------------------------------+--------------+
| VendorId | ProductName                     | ProductPrice |
+----------+---------------------------------+--------------+
|     1001 | Left handed screwdriver         |        25.99 |
|     1001 | Right handed screwdriver        |        25.99 |
|     1001 | Long Weight (blue)              |        14.75 |
|     1001 | Long Weight (green)             |        11.99 |
|     1002 | Sledge Hammer                   |        33.49 |
|     1003 | Chainsaw                        |       245.00 |
|     1003 | Straw Dog Box                   |        55.99 |
|     1003 | Hammock                         |        10.00 |
|     1004 | Bottomless Coffee Mugs (4 Pack) |        10.00 |
|     1004 | Tea Pot                         |        12.45 |
+----------+---------------------------------+--------------+
10 rows in set (0.00 sec)

Simple DISTINCT Example

We can see from the above table that the VendorId column contains several duplicate values. For example, 1001 appears four times, 1003 appears three times, and 1004 appears twice.

If we wanted a list of unique vendor IDs, without duplicates, we could do this:

SELECT
    DISTINCT VendorId
FROM Products
ORDER BY VendorId;

Result:

+----------+
| VendorId |
+----------+
|     1001 |
|     1002 |
|     1003 |
|     1004 |
+----------+
4 rows in set (0.00 sec)

This time we only get four rows instead of the ten that we got in the previous query. This query returned just the unique vendor IDs, but no more.

Here’s what happens if we remove the DISTINCT keyword:

SELECT
    VendorId
FROM Products
ORDER BY VendorId;

Result:

+----------+
| VendorId |
+----------+
|     1001 |
|     1001 |
|     1001 |
|     1001 |
|     1002 |
|     1003 |
|     1003 |
|     1003 |
|     1004 |
|     1004 |
+----------+
10 rows in set (0.00 sec)

This time we got ten rows. That’s because some vendor IDs were listed multiple times, due to the fact that we didn’t “de-dupe” them.

Adding Columns

In the above example we selected just one column, which meant that the DISTINCT modifier was applied to just that column. If we add additional columns to the SELECT list, then the DISTINCT modifier will apply to all of those columns.

Example:

SELECT
    DISTINCT VendorId, ProductPrice
FROM Products
ORDER BY VendorId;

Result:

+----------+--------------+
| VendorId | ProductPrice |
+----------+--------------+
|     1001 |        11.99 |
|     1001 |        14.75 |
|     1001 |        25.99 |
|     1002 |        33.49 |
|     1003 |        10.00 |
|     1003 |        55.99 |
|     1003 |       245.00 |
|     1004 |        10.00 |
|     1004 |        12.45 |
+----------+--------------+
9 rows in set (0.00 sec)

This time nine rows are returned. Only one row was de-duped. That’s because there was only one instance of a duplicate. In this case, vendor 1001 has two products with a price of 25.99. Therefore, that combination is shown just once in the result.

Here’s what I mean:

SELECT
   VendorId, ProductName, ProductPrice
FROM Products
WHERE ProductPrice = 25.99;

Result:

+----------+--------------------------+--------------+
| VendorId | ProductName              | ProductPrice |
+----------+--------------------------+--------------+
|     1001 | Left handed screwdriver  |        25.99 |
|     1001 | Right handed screwdriver |        25.99 |
+----------+--------------------------+--------------+
2 rows in set (0.00 sec)

We can see that vendor 1001 has two different products priced at 25.99.

The DISTINCT ON Option

PostgreSQL allows us to add an ON to our DISTINCT clause so that we can specify which column/s to apply the DISTINCT clause to.

When we use this option, we use parentheses to specify the expression/s for which to apply the modifier to. It then keeps only the first row of each set of rows where the given expressions evaluate to equal.

Example:

SELECT 
    DISTINCT ON (VendorId) VendorId, ProductPrice
FROM Products
ORDER BY VendorId, ProductPrice;

Result:

 vendorid | productprice 
----------+--------------
     1001 |        11.99
     1002 |        33.49
     1003 |        10.00
     1004 |        10.00
(4 rows)

We can change which values are returned within each vendor by modifying the ORDER BY clause.

For example:

SELECT 
    DISTINCT ON (VendorId) VendorId, ProductPrice
FROM Products
ORDER BY VendorId, ProductPrice;

Result:

 vendorid | productprice 
----------+--------------
     1001 |        25.99
     1002 |        33.49
     1003 |       245.00
     1004 |        12.45
(4 rows)

Changing the Column Order

Here’s the same example, but with the columns in a different order:

SELECT
    DISTINCT ProductPrice, VendorId
FROM Products
ORDER BY ProductPrice;

Result:

+--------------+----------+
| ProductPrice | VendorId |
+--------------+----------+
|        10.00 |     1003 |
|        10.00 |     1004 |
|        11.99 |     1001 |
|        12.45 |     1004 |
|        14.75 |     1001 |
|        25.99 |     1001 |
|        33.49 |     1002 |
|        55.99 |     1003 |
|       245.00 |     1003 |
+--------------+----------+
9 rows in set (0.00 sec)

I also sorted the results by a different column. Basically, changing the order of the columns didn’t change which rows were returned. We still eliminated duplicate rows, based on the columns that were being returned.

Using a Different Column

Whether or not any rows are de-duped depends entirely on the columns being returned and whether or they result in duplicate data in two or more rows.

Here’s what happens when I swap the ProductPrice column for the ProductName column:

SELECT
    DISTINCT VendorId, ProductName
FROM Products
ORDER BY VendorId;

Result:

+----------+---------------------------------+
| VendorId | ProductName                     |
+----------+---------------------------------+
|     1001 | Left handed screwdriver         |
|     1001 | Long Weight (blue)              |
|     1001 | Long Weight (green)             |
|     1001 | Right handed screwdriver        |
|     1002 | Sledge Hammer                   |
|     1003 | Chainsaw                        |
|     1003 | Hammock                         |
|     1003 | Straw Dog Box                   |
|     1004 | Bottomless Coffee Mugs (4 Pack) |
|     1004 | Tea Pot                         |
+----------+---------------------------------+
10 rows in set (0.00 sec)

This time nothing was de-duped. All ten rows were returned. That’s because all of the product names are unique. This uniqueness meant that no two rows could contain exactly the same data. Therefore, the DISTINCT keyword had no effect – we got the same results that we’d get if we removed it altogether.

Let’s remove the DISTINCT keyword to demonstrate this:

SELECT
    VendorId, ProductName
FROM Products
ORDER BY VendorId;

Result:

+----------+---------------------------------+
| VendorId | ProductName                     |
+----------+---------------------------------+
|     1001 | Left handed screwdriver         |
|     1001 | Right handed screwdriver        |
|     1001 | Long Weight (blue)              |
|     1001 | Long Weight (green)             |
|     1002 | Sledge Hammer                   |
|     1003 | Chainsaw                        |
|     1003 | Straw Dog Box                   |
|     1003 | Hammock                         |
|     1004 | Bottomless Coffee Mugs (4 Pack) |
|     1004 | Tea Pot                         |
+----------+---------------------------------+
10 rows in set (0.00 sec)

As expected, same result.

Counting Unique Values

We can use the DISTINCT keyword inside the COUNT() function in order to get a count of the unique values, without having to return the actual data itself.

Example:

SELECT
    COUNT(VendorId) AS "Total",
    COUNT(DISTINCT VendorId) AS "Unique"
FROM Products;

Result:

+-------+--------+
| Total | Unique |
+-------+--------+
|    10 |      4 |
+-------+--------+

Here, I used UNIQUE in the first column but removed it in the second column. Therefore we can see both the total number of rows, and the number of unique rows.

The ALL Keyword

When we use the DISTINCT keyword, we’re overriding the default behaviour of the SQL query, which is to return all rows, regardless of duplicates.

This is because SQL queries implicitly use the ALL keyword by default.

We can explicitly use ALL instead of DISTINCT in order to return all rows. Therefore we could modify our previous query as follows:

SELECT
    COUNT(ALL VendorId) AS "Total",
    COUNT(DISTINCT VendorId) AS "Unique"
FROM Products;

Result:

+-------+--------+
| Total | Unique |
+-------+--------+
|    10 |      4 |
+-------+--------+

So all I did was add the ALL keyword to the first COUNT() function. We got the same result that we got without this keyword. That’s because ALL is the default behaviour.

Synonyms

Some RDBMSs allow for synonyms to be used instead of the DISTINCT keyword.

For example, in MySQL we can use DISTINCTROW instead of DISTINCT if we prefer. And in Oracle Database we can use UNIQUE as an alternative to DISTINCT.

Here’s an example of using DISTINCTROW in MySQL:

SELECT
    DISTINCTROW VendorId
FROM Products
ORDER BY VendorId;

Result:

+----------+
| VendorId |
+----------+
|     1001 |
|     1002 |
|     1003 |
|     1004 |
+----------+
4 rows in set (0.00 sec)

As alluded to, it returns the same result as DISTINCT, because it’s a synonym for DISTINCT. That said, it’s probably a good idea to stick with using DISTINCT unless you have reason not to.