DuckDB, an in-memory analytical SQL database management system, provides a good selection of functions for data manipulation and analysis. One of the these functions is list()
.
The list()
function enables users to group multiple values into a single list, allowing for more advanced data aggregation and operations.
In this article, we’ll explore how the list()
function works, along with some simple examples.
What is the list()
Function in DuckDB?
DuckDB’s list()
function creates a list containing the values from a specified column within a group of rows. It can be used as an aggregation function in SQL queries, similar to other functions like sum()
, avg()
, or string_agg()
. This feature can be useful when performing data transformations and creating complex structures within queries.
The array_agg()
function is an alias for list()
, and so we can use either one interchangeably.
Syntax of the list()
Function
The basic syntax for using the list()
function in DuckDB is as follows:
list(arg)
Where arg
is the column whose values you want to aggregate into a list.
As mentioned, array_agg()
is an alias for list()
, so we can also use the following syntax:
array_agg(arg)
Sample Data for Examples
Let’s create and populate a table called sales_data
so we can run some examples:
-- Create the sales_data table
CREATE TABLE sales_data (
customer_id INTEGER,
product TEXT,
sales_amount FLOAT
);
-- Insert sample data
INSERT INTO sales_data (customer_id, product, sales_amount) VALUES
(1, 'Laptop', 1200.00),
(1, 'Phone', 800.00),
(2, 'Tablet', 600.00),
(2, 'Laptop', 1200.00),
(3, 'Phone', 750.00);
-- Check the data inserted
SELECT * FROM sales_data;
Output:
+-------------+---------+--------------+
| customer_id | product | sales_amount |
+-------------+---------+--------------+
| 1 | Laptop | 1200.0 |
| 1 | Phone | 800.0 |
| 2 | Tablet | 600.0 |
| 2 | Laptop | 1200.0 |
| 3 | Phone | 750.0 |
+-------------+---------+--------------+
Basic Example
We can use the list()
function like this:
SELECT
list(product) AS products_list
FROM sales_data;
Result:
+----------------------------------------+
| products_list |
+----------------------------------------+
| [Laptop, Phone, Tablet, Laptop, Phone] |
+----------------------------------------+
Grouped Query
To create a list of products purchased by each customer, we can use the following query:
SELECT
customer_id,
list(product) AS products_list
FROM sales_data
GROUP BY customer_id;
The result will look like this:
+-------------+------------------+
| customer_id | products_list |
+-------------+------------------+
| 1 | [Laptop, Phone] |
| 2 | [Tablet, Laptop] |
| 3 | [Phone] |
+-------------+------------------+
Combining list()
with Other Aggregations
We can combine list()
with other aggregate functions for more advanced operations. For example, if we want to retrieve the total sales amount and the list of products for each customer, the query could look like this:
SELECT
customer_id,
list(product) AS products_list,
sum(sales_amount) AS total_sales
FROM sales_data
GROUP BY customer_id;
Result:
+-------------+------------------+-------------+
| customer_id | products_list | total_sales |
+-------------+------------------+-------------+
| 1 | [Laptop, Phone] | 2000.0 |
| 2 | [Tablet, Laptop] | 1800.0 |
| 3 | [Phone] | 750.0 |
+-------------+------------------+-------------+
More Examples
I’ve also written an article about the array_agg()
function that contains some examples. Given array_agg()
is an alias for list()
, the examples in that article also apply to the list()
function.