DuckDB has a string_agg()
function, which allows us to concatenate strings from a group of rows into a single string. This function can be useful when we need to aggregate text data in a meaningful way.
In this article, we’ll explore how the string_agg()
function works, along with some simple examples to demonstrate its usage.
What is the string_agg()
Function?
The string_agg()
function is an aggregate function that concatenates values from multiple rows into a single string. It is similar to the group_concat()
function in other RDBMSs, such as MySQL.
The syntax for the string_agg()
function is as follows:
string_agg(expression, delimiter)
So the function takes two arguments:
- The
expression
to concatenate (usually a column name). - A
delimiter
that separates the concatenated values.
Sample Data
To demonstrate how string_agg()
works, let’s create a sample table and populate it with some data. We’ll use a table called employees
that stores information about employees and their departments.
CREATE TABLE employees (
id INTEGER,
name VARCHAR,
department VARCHAR
);
INSERT INTO employees (id, name, department) VALUES
(1, 'Alvin', 'HR'),
(2, 'Barbara', 'HR'),
(3, 'Hesham', 'Engineering'),
(4, 'Cuong', 'Engineering'),
(5, 'Axel', 'Marketing'),
(6, 'Frady', 'Marketing'),
(7, 'Blanche', 'Marketing');
Example 1: Basic Usage of string_agg()
Let’s start with a simple example. Suppose we want to concatenate the names of all employees in the employees
table, separated by a comma.
SELECT
string_agg(name, ', ') AS all_employee_names
FROM employees;
This query will produce the following result:
+-----------------------------------------------------+
| all_employee_names |
+-----------------------------------------------------+
| Alvin, Barbara, Hesham, Cuong, Axel, Frady, Blanche |
+-----------------------------------------------------+
Here, the string_agg()
function concatenates all the name
values from the employees
table, separating them with a comma and a space.
Example 2: Grouping Data with string_agg()
We can combine string_agg()
with the GROUP BY
clause to get separate concatenated strings, based on their grouping. For example, we can concatenate the names of employees within each department:
SELECT
department,
string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;
This query produces the following result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+
In this example, the string_agg()
function concatenates the names of employees within each department, and the GROUP BY
clause ensures that the results are grouped by department.
Example 3: Custom Delimiters
The delimiter used in string_agg()
can be any string. Let’s use a custom delimiter, such as a semicolon followed by a space, to separate the names.
SELECT
department,
string_agg(name, '; ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| HR | Alvin; Barbara |
| Engineering | Hesham; Cuong |
| Marketing | Axel; Frady; Blanche |
+-------------+----------------------+
And here’s one that uses a hyphen as the separator:
SELECT
department,
string_agg(name, '-') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+--------------------+
| department | employee_names |
+-------------+--------------------+
| Engineering | Hesham-Cuong |
| Marketing | Axel-Frady-Blanche |
| HR | Alvin-Barbara |
+-------------+--------------------+
Example 4: Ordering the Concatenated Values
You can also control the order in which the values are concatenated by using the ORDER BY
clause within the string_agg()
function.
For instance, let’s concatenate the names in alphabetical order within each department.
SELECT
department,
string_agg(name, ', ' ORDER BY name) AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Marketing | Axel, Blanche, Frady |
| Engineering | Cuong, Hesham |
| HR | Alvin, Barbara |
+-------------+----------------------+
In this case, the names are concatenated in alphabetical order within each department.
Example 5: Handling NULL
Values
By default, string_agg()
ignores NULL
values. However, if you want to include a placeholder for NULL
values, you can use the coalesce()
function to replace NULL
with a default value before concatenation.
Let’s modify our employees
table to include a NULL
value:
INSERT INTO employees (id, name, department) VALUES
(8, NULL, 'HR');
First, we’ll see how string_agg()
handles it by default:
-- Set a string for null values so we can see if it appears
.nullvalue 'null'
-- Run the query
SELECT
department,
string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+
We can see that the NULL
value doesn’t turn up in the result. This is because string_agg()
skipped it.
In this case, I specifically specified .nullvalue 'null'
so that the string null
would appear if the null value was in fact going to appear. I did this to eliminate the possibility that an empty string appears for null
values (which is the default).
Now let’s use the coalesce()
function to output a value in place of the NULL
value:
SELECT
department,
string_agg(coalesce(name, 'Unknown'), ', ') AS employee_names
FROM employees
GROUP BY department;
Result:
+-------------+-------------------------+
| department | employee_names |
+-------------+-------------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara, Unknown |
| Marketing | Axel, Frady, Blanche |
+-------------+-------------------------+
This time the NULL
value in the name
column is replaced with 'Unknown'
before concatenation.
Conclusion
The string_agg()
function in DuckDB is a handy tool for concatenating strings from multiple rows into a single string.
Whether you need to aggregate names, generate comma-separated lists, or handle NULL
values, string_agg()
provides a simple and efficient way to achieve your goals. By combining it with GROUP BY
and ORDER BY
, we can create useful queries that transform our data into meaningful, aggregated strings.