Removing Duplicates from a List in DuckDB with the LIST_DISTINCT() Function

DuckDB has a function called list_distinct() that enables us to remove duplicate values from a list. Any value that appears more than once is “deduped” and is returned once in the resulting list.

The function also removes any NULL values.

Example

All we need to do is pass the list to the function:

SELECT list_distinct(['Dog', 'Cat', 'Dog', 'Bird']);

Result:

[Cat, Bird, Dog]

One thing to bear in mind is that the list_distinct() function doesn’t preserve the original order, as witnessed in this example.

NULL Values

As mentioned, list_distinct() removes any NULL values from the list:

SELECT list_distinct(['Dog', NULL, 'Dog', 'Bird']);

Result:

[Bird, Dog]

The array_distinct() Function

There’s also a function called array_distinct(), which is an alias of list_distinct(). Therefore, we can use array_distinct() to do the exact same thing as list_distinct():

SELECT 
    list_distinct(['Dog', NULL, 'Dog', 'Bird']) AS list_distinct,
    array_distinct(['Dog', NULL, 'Dog', 'Bird']) AS array_distinct;

Result:

+---------------+----------------+
| list_distinct | array_distinct |
+---------------+----------------+
| [Bird, Dog] | [Bird, Dog] |
+---------------+----------------+

Using list_distinct() When Concatenating Lists

The list_distinct() function could come in handy when concatenating multiple lists into a single list. In such cases, the same value/s could be in multiple lists, but you might want those values to appear only once in the resulting list. That’s where list_distinct() comes in.

SELECT 
    list_concat(
            ['Dog', NULL, 'Cat'],
            ['Cat', 'Bird']
        ) AS including_duplicates,
    list_distinct(
        list_concat(
                ['Dog', NULL, 'Cat'],
                ['Cat', 'Bird']
            )
        ) AS deduped;

Result:

+-----------------------------+------------------+
| including_duplicates | deduped |
+-----------------------------+------------------+
| [Dog, NULL, Cat, Cat, Bird] | [Cat, Bird, Dog] |
+-----------------------------+------------------+

It’s the same when using the flatten() function to concatenate a list of lists into a flattened list:

SELECT 
    flatten([
            ['Dog', NULL, 'Cat'],
            ['Cat', 'Bird']
        ]) AS including_duplicates,
    list_distinct(
        flatten([
                ['Dog', NULL, 'Cat'],
                ['Cat', 'Bird']
            ])
        ) AS deduped;

Result:

+-----------------------------+------------------+
| including_duplicates | deduped |
+-----------------------------+------------------+
| [Dog, NULL, Cat, Cat, Bird] | [Cat, Bird, Dog] |
+-----------------------------+------------------+