A Quick Look at the LIST() Function in DuckDB

DuckDB, an in-memory analytical SQL database management system, provides a good selection of functions for data manipulation and analysis. One of the these functions is list().

The list() function enables users to group multiple values into a single list, allowing for more advanced data aggregation and operations.

In this article, we’ll explore how the list() function works, along with some simple examples.

What is the list() Function in DuckDB?

DuckDB’s list() function creates a list containing the values from a specified column within a group of rows. It can be used as an aggregation function in SQL queries, similar to other functions like sum(), avg(), or string_agg(). This feature can be useful when performing data transformations and creating complex structures within queries.

The array_agg() function is an alias for list(), and so we can use either one interchangeably.

Syntax of the list() Function

The basic syntax for using the list() function in DuckDB is as follows:

list(arg)

Where arg is the column whose values you want to aggregate into a list.

As mentioned, array_agg() is an alias for list(), so we can also use the following syntax:

array_agg(arg)

Sample Data for Examples

Let’s create and populate a table called sales_data so we can run some examples:

-- Create the sales_data table
CREATE TABLE sales_data (
    customer_id INTEGER,
    product TEXT,
    sales_amount FLOAT
);

-- Insert sample data
INSERT INTO sales_data (customer_id, product, sales_amount) VALUES
(1, 'Laptop', 1200.00),
(1, 'Phone', 800.00),
(2, 'Tablet', 600.00),
(2, 'Laptop', 1200.00),
(3, 'Phone', 750.00);
  
-- Check the data inserted
SELECT * FROM sales_data;

Output:

+-------------+---------+--------------+
| customer_id | product | sales_amount |
+-------------+---------+--------------+
| 1 | Laptop | 1200.0 |
| 1 | Phone | 800.0 |
| 2 | Tablet | 600.0 |
| 2 | Laptop | 1200.0 |
| 3 | Phone | 750.0 |
+-------------+---------+--------------+

Basic Example

We can use the list() function like this:

SELECT 
    list(product) AS products_list  
FROM sales_data;

Result:

+----------------------------------------+
| products_list |
+----------------------------------------+
| [Laptop, Phone, Tablet, Laptop, Phone] |
+----------------------------------------+

Grouped Query

To create a list of products purchased by each customer, we can use the following query:

SELECT 
    customer_id, 
    list(product) AS products_list  
FROM sales_data  
GROUP BY customer_id;

The result will look like this:

+-------------+------------------+
| customer_id | products_list |
+-------------+------------------+
| 1 | [Laptop, Phone] |
| 2 | [Tablet, Laptop] |
| 3 | [Phone] |
+-------------+------------------+

Combining list() with Other Aggregations

We can combine list() with other aggregate functions for more advanced operations. For example, if we want to retrieve the total sales amount and the list of products for each customer, the query could look like this:

SELECT 
    customer_id, 
    list(product) AS products_list, 
    sum(sales_amount) AS total_sales  
FROM sales_data  
GROUP BY customer_id;

Result:

+-------------+------------------+-------------+
| customer_id | products_list | total_sales |
+-------------+------------------+-------------+
| 1 | [Laptop, Phone] | 2000.0 |
| 2 | [Tablet, Laptop] | 1800.0 |
| 3 | [Phone] | 750.0 |
+-------------+------------------+-------------+

More Examples

I’ve also written an article about the array_agg() function that contains some examples. Given array_agg() is an alias for list(), the examples in that article also apply to the list() function.