DuckDB has a function called list_distinct()
that enables us to remove duplicate values from a list. Any value that appears more than once is “deduped” and is returned once in the resulting list.
The function also removes any NULL values.
Example
All we need to do is pass the list to the function:
SELECT list_distinct(['Dog', 'Cat', 'Dog', 'Bird']);
Result:
[Cat, Bird, Dog]
One thing to bear in mind is that the list_distinct()
function doesn’t preserve the original order, as witnessed in this example.
NULL Values
As mentioned, list_distinct()
removes any NULL values from the list:
SELECT list_distinct(['Dog', NULL, 'Dog', 'Bird']);
Result:
[Bird, Dog]
The array_distinct()
Function
There’s also a function called array_distinct()
, which is an alias of list_distinct()
. Therefore, we can use array_distinct()
to do the exact same thing as list_distinct()
:
SELECT
list_distinct(['Dog', NULL, 'Dog', 'Bird']) AS list_distinct,
array_distinct(['Dog', NULL, 'Dog', 'Bird']) AS array_distinct;
Result:
+---------------+----------------+
| list_distinct | array_distinct |
+---------------+----------------+
| [Bird, Dog] | [Bird, Dog] |
+---------------+----------------+
Using list_distinct()
When Concatenating Lists
The list_distinct()
function could come in handy when concatenating multiple lists into a single list. In such cases, the same value/s could be in multiple lists, but you might want those values to appear only once in the resulting list. That’s where list_distinct()
comes in.
SELECT
list_concat(
['Dog', NULL, 'Cat'],
['Cat', 'Bird']
) AS including_duplicates,
list_distinct(
list_concat(
['Dog', NULL, 'Cat'],
['Cat', 'Bird']
)
) AS deduped;
Result:
+-----------------------------+------------------+
| including_duplicates | deduped |
+-----------------------------+------------------+
| [Dog, NULL, Cat, Cat, Bird] | [Cat, Bird, Dog] |
+-----------------------------+------------------+
It’s the same when using the flatten()
function to concatenate a list of lists into a flattened list:
SELECT
flatten([
['Dog', NULL, 'Cat'],
['Cat', 'Bird']
]) AS including_duplicates,
list_distinct(
flatten([
['Dog', NULL, 'Cat'],
['Cat', 'Bird']
])
) AS deduped;
Result:
+-----------------------------+------------------+
| including_duplicates | deduped |
+-----------------------------+------------------+
| [Dog, NULL, Cat, Cat, Bird] | [Cat, Bird, Dog] |
+-----------------------------+------------------+