DuckDB’s list_aggregate()
function is a handy tool for performing grouped aggregations over lists. It allows us to apply any aggregate function (like sum
, avg
, min
, max
, count
, etc.) to a list column as if each list item were a row.
The way it works is that we pass the list as the first argument, followed by the name of an aggregate function we want to apply to that list. The function will return its result as if it were the named aggregate function.
Syntax
The syntax goes something like this:
list_aggregate(list, aggregate_function [, extra_parameter ...])
list
: A list or a column containing lists (e.g.,[1, 2, 3]
).aggregate_function
: A string naming the function to apply (e.g.,'sum'
,'avg'
,'count'
).extra_parameter
: The third (and further) parameters let you pass extra arguments to the aggregate function you’re using. This is useful for aggregate functions that accept more than one argument. (such asstring_agg
,listagg
,quantile
, etc).
Here’s a more detailed depiction of the syntax that shows the data types:
list_aggregate(ANY[], VARCHAR, [ANY...]) -> ANY
The function also has a bunch of aliases:
list_aggr()
aggregate()
array_aggregate()
array_aggr()
Any of these can be used to return the same output as list_aggregate()
.
Basic Examples
Here’s a simple example to demonstrate:
SELECT list_aggregate([10, 20, 35], 'sum');
Output:
65
In this case I passed sum
as the second argument. This resulted in the numbers in the list (in the first argument) being added up.
It works with fractions too:
SELECT list_aggregate([1.0, 2.0, 3.5], 'sum');
Output:
6.5
Here are examples of more functions we can pass:
SET VARIABLE numbers = [10, 20, 35, 270, 360, 465, 54.37];
SELECT
list_aggregate(getvariable('numbers'), 'count') AS count,
list_aggregate(getvariable('numbers'), 'avg') AS avg,
list_aggregate(getvariable('numbers'), 'min') AS min,
list_aggregate(getvariable('numbers'), 'max') AS max,
list_aggregate(getvariable('numbers'), 'stddev_samp') AS stddev_samp;
Output:
+-------+--------------------+-------+--------+--------------------+
| count | avg | min | max | stddev_samp |
+-------+--------------------+-------+--------+--------------------+
| 7 | 173.48142857142858 | 10.00 | 465.00 | 188.29739531112048 |
+-------+--------------------+-------+--------+--------------------+
String Concatenation
We can pass string_agg
to have the list items concatenated into a string:
SELECT list_aggregate(['Okra', 'Avocado', 'Eggplant'], 'string_agg');
Output:
Okra,Avocado,Eggplant
Passing a list of numbers performs the same operation:
SELECT list_aggregate([1,2,3], 'string_agg');
Output:
1,2,3
Passing a Third Argument
Some aggregate functions (such as string_agg
, quantile
, etc) accept more than one argument. The list_aggregate()
function caters for this by allowing us to provide additional arguments.
For example, we can pass a third argument to our previous example in order to specify the separator to use:
SELECT list_aggregate(['Okra', 'Avocado', 'Eggplant'], 'string_agg', '|');
Output:
Okra|Avocado|Eggplant
Specifying an Invalid Function
Specifying an invalid function results in an error. For example:
SELECT list_aggregate([1,2,3], 'oops');
Output:
Catalog Error:
Aggregate Function with name oops does not exist!
Did you mean "cos"?
Mismatching the list elements data type with the specified function can also result in an error:
SELECT list_aggregate(['Okra', 'Avocado', 'Eggplant'], 'sum');
Output:
Binder Error:
No matching aggregate function
Binder Error: No function matches the given name and argument types 'sum(VARCHAR)'. You might need to add explicit type casts.
Candidate functions:
sum(DECIMAL) -> DECIMAL
sum(BOOLEAN) -> HUGEINT
sum(SMALLINT) -> HUGEINT
sum(INTEGER) -> HUGEINT
sum(BIGINT) -> HUGEINT
sum(HUGEINT) -> HUGEINT
sum(DOUBLE) -> DOUBLE
In this case I tried to calculate the sum of the values in the list. Only problem is that the list doesn’t contain numbers. It contains strings. Therefore an error was returned.
Aliases
As mentioned, list_aggregate()
has a bunch of aliases. That means we can use either of them to achieve the same result.
Here they all are together:
SELECT
list_aggregate([10, 20, 35], 'sum') AS list_aggregate,
list_aggr([10, 20, 35], 'sum') AS list_aggr,
aggregate([10, 20, 35], 'sum') AS aggregate,
array_aggregate([10, 20, 35], 'sum') AS array_aggregate,
array_aggr([10, 20, 35], 'sum') AS array_aggr;
Result:
+----------------+-----------+-----------+-----------------+------------+
| list_aggregate | list_aggr | aggregate | array_aggregate | array_aggr |
+----------------+-----------+-----------+-----------------+------------+
| 65 | 65 | 65 | 65 | 65 |
+----------------+-----------+-----------+-----------------+------------+