If you ever find yourself in the situation where you need to count up the number of unique elements in a list in DuckDB, you’ll be happy to know that there’s a list_unique()
function that does exactly that.
Example
The function accepts only one argument; the list. So we can use it like this:
SELECT list_unique(['Sky', 'Box', 'Run']);
Result:
3
In this case, all elements in the list are unique and so we get the number of elements in the list, which is three.
Here’s an example where the list contains a duplicate value:
SELECT list_unique(['Sky', 'Box', 'Run', 'Sky']);
Result:
3
We still get three, because that’s how many unique values are in the list, once duplicates are ignored.
NULL Values
NULL values are ignored:
SELECT list_unique(['Sky', 'Box', NULL, 'Run', 'Sky']);
Result:
3
Once again, we get three.
Nested Lists
It works on nested lists too:
SELECT list_unique([
['Cloud', 'Ant', 'Ant'],
['Cloud', 'Ant', 'Ant'],
['Ant', 'Cloud', 'Ant'],
['Ant', 'Sky', 'Bee']
]);
Result:
3
When we do this, it compares each list for uniqueness. The first two lists are identical, and so they’re counted as one, which leaves us with three unique lists.
Unnesting the list or flattening it can produce different results.
Unnested Lists
Let’s unnest the list with the unnest()
function:
SELECT list_unique(
unnest([
['Cloud', 'Ant', 'Ant'],
['Cloud', 'Ant', 'Ant'],
['Ant', 'Cloud', 'Ant'],
['Ant', 'Sky', 'Bee']
])
) AS unnested;
Result:
+----------+
| unnested |
+----------+
| 2 |
| 2 |
| 2 |
| 3 |
+----------+
Flattened Lists
We can also use the flatten()
function to flatten the list:
SELECT list_unique(
flatten([
['Cloud', 'Ant', 'Ant'],
['Cloud', 'Ant', 'Ant'],
['Ant', 'Cloud', 'Ant'],
['Ant', 'Sky', 'Bee']
])
);
Result:
4
To understand how it got to these results, we can look at the output of the flatten()
function (without the list_unique()
function):
SELECT flatten([
['Cloud', 'Ant', 'Ant'],
['Cloud', 'Ant', 'Ant'],
['Ant', 'Cloud', 'Ant'],
['Ant', 'Sky', 'Bee']
]);
Result:
[Cloud, Ant, Ant, Cloud, Ant, Ant, Ant, Cloud, Ant, Ant, Sky, Bee]
So that’s the list that list_unique()
operated on when it came up with four unique values.
An Alias: The array_unique()
Function
There’s also an array_unique()
function, which is an alias for list_unique()
. That means we can use it to do exactly what list_unique()
does.
Example:
SELECT
list_unique(['Sky', 'Box', NULL, 'Run', 'Sky']) AS list_unique,
array_unique(['Sky', 'Box', NULL, 'Run', 'Sky']) AS array_unique;
Result:
+-------------+--------------+
| list_unique | array_unique |
+-------------+--------------+
| 3 | 3 |
+-------------+--------------+