Looking at DuckDB’s STRING_AGG() Function

DuckDB has a string_agg() function, which allows us to concatenate strings from a group of rows into a single string. This function can be useful when we need to aggregate text data in a meaningful way.

In this article, we’ll explore how the string_agg() function works, along with some simple examples to demonstrate its usage.

What is the string_agg() Function?

The string_agg() function is an aggregate function that concatenates values from multiple rows into a single string. It is similar to the group_concat() function in other RDBMSs, such as MySQL.

The syntax for the string_agg() function is as follows:

string_agg(expression, delimiter)

So the function takes two arguments:

  1. The expression to concatenate (usually a column name).
  2. A delimiter that separates the concatenated values.

Sample Data

To demonstrate how string_agg() works, let’s create a sample table and populate it with some data. We’ll use a table called employees that stores information about employees and their departments.

CREATE TABLE employees (
    id INTEGER,
    name VARCHAR,
    department VARCHAR
);

INSERT INTO employees (id, name, department) VALUES
(1, 'Alvin', 'HR'),
(2, 'Barbara', 'HR'),
(3, 'Hesham', 'Engineering'),
(4, 'Cuong', 'Engineering'),
(5, 'Axel', 'Marketing'),
(6, 'Frady', 'Marketing'),
(7, 'Blanche', 'Marketing');

Example 1: Basic Usage of string_agg()

Let’s start with a simple example. Suppose we want to concatenate the names of all employees in the employees table, separated by a comma.

SELECT 
    string_agg(name, ', ') AS all_employee_names
FROM employees;

This query will produce the following result:

+-----------------------------------------------------+
| all_employee_names |
+-----------------------------------------------------+
| Alvin, Barbara, Hesham, Cuong, Axel, Frady, Blanche |
+-----------------------------------------------------+

Here, the string_agg() function concatenates all the name values from the employees table, separating them with a comma and a space.

Example 2: Grouping Data with string_agg()

We can combine string_agg() with the GROUP BY clause to get separate concatenated strings, based on their grouping. For example, we can concatenate the names of employees within each department:

SELECT 
    department, 
    string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;

This query produces the following result:

+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+

In this example, the string_agg() function concatenates the names of employees within each department, and the GROUP BY clause ensures that the results are grouped by department.

Example 3: Custom Delimiters

The delimiter used in string_agg() can be any string. Let’s use a custom delimiter, such as a semicolon followed by a space, to separate the names.

SELECT 
    department, 
    string_agg(name, '; ') AS employee_names
FROM employees
GROUP BY department;

Result:

+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| HR | Alvin; Barbara |
| Engineering | Hesham; Cuong |
| Marketing | Axel; Frady; Blanche |
+-------------+----------------------+

And here’s one that uses a hyphen as the separator:

SELECT 
    department, 
    string_agg(name, '-') AS employee_names
FROM employees
GROUP BY department;

Result:

+-------------+--------------------+
| department | employee_names |
+-------------+--------------------+
| Engineering | Hesham-Cuong |
| Marketing | Axel-Frady-Blanche |
| HR | Alvin-Barbara |
+-------------+--------------------+

Example 4: Ordering the Concatenated Values

You can also control the order in which the values are concatenated by using the ORDER BY clause within the string_agg() function.

For instance, let’s concatenate the names in alphabetical order within each department.

SELECT 
    department, 
    string_agg(name, ', ' ORDER BY name) AS employee_names
FROM employees
GROUP BY department;

Result:

+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Marketing | Axel, Blanche, Frady |
| Engineering | Cuong, Hesham |
| HR | Alvin, Barbara |
+-------------+----------------------+

In this case, the names are concatenated in alphabetical order within each department.

Example 5: Handling NULL Values

By default, string_agg() ignores NULL values. However, if you want to include a placeholder for NULL values, you can use the coalesce() function to replace NULL with a default value before concatenation.

Let’s modify our employees table to include a NULL value:

INSERT INTO employees (id, name, department) VALUES
(8, NULL, 'HR');

First, we’ll see how string_agg() handles it by default:

-- Set a string for null values so we can see if it appears
.nullvalue 'null'

-- Run the query
SELECT 
    department, 
    string_agg(name, ', ') AS employee_names
FROM employees
GROUP BY department;

Result:

+-------------+----------------------+
| department | employee_names |
+-------------+----------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara |
| Marketing | Axel, Frady, Blanche |
+-------------+----------------------+

We can see that the NULL value doesn’t turn up in the result. This is because string_agg() skipped it.

In this case, I specifically specified .nullvalue 'null' so that the string null would appear if the null value was in fact going to appear. I did this to eliminate the possibility that an empty string appears for null values (which is the default).

Now let’s use the coalesce() function to output a value in place of the NULL value:

SELECT 
    department, 
    string_agg(coalesce(name, 'Unknown'), ', ') AS employee_names
FROM employees
GROUP BY department;

Result:

+-------------+-------------------------+
| department | employee_names |
+-------------+-------------------------+
| Engineering | Hesham, Cuong |
| HR | Alvin, Barbara, Unknown |
| Marketing | Axel, Frady, Blanche |
+-------------+-------------------------+

This time the NULL value in the name column is replaced with 'Unknown' before concatenation.

Conclusion

The string_agg() function in DuckDB is a handy tool for concatenating strings from multiple rows into a single string.

Whether you need to aggregate names, generate comma-separated lists, or handle NULL values, string_agg() provides a simple and efficient way to achieve your goals. By combining it with GROUP BY and ORDER BY, we can create useful queries that transform our data into meaningful, aggregated strings.