Concatenating Strings in DuckDB

String concatenation is a common operation when running database queries. It simply involves joining two strings together, end to end. DuckDB provides multiple methods for combining strings, each with its own use cases and advantages.

This article explores the various ways to concatenate strings in DuckDB.

Basic String Concatenation

Using the || Operator

The concatenation operator (||) is used to concatenate the string to its left with the string on its right. In other words, we put || in between the two strings to concatenate.

Suppose we have the following table:

-- Create a sample table
CREATE TABLE employees (
    id INTEGER,
    first_name VARCHAR,
    last_name VARCHAR,
    department VARCHAR,
    email VARCHAR
);

-- Insert sample data
INSERT INTO employees VALUES
    (1, 'Hector', 'Ajit', 'Engineering', NULL),
    (2, 'Maria', 'Swoosh', 'Marketing', '[email protected]'),
    (3, 'Muhammad', NULL, 'Sales', '[email protected]'),
    (4, 'Sarah', 'Jensen', NULL, '[email protected]'),
    (5, NULL, 'Brattley', 'HR', '[email protected]');

A common thing to do would be to concatenate the first name and last names:

SELECT 
    first_name || ' ' || last_name AS full_name
FROM employees;

Result:

+--------------+
| full_name |
+--------------+
| Hector Ajit |
| Maria Swoosh |
| null |
| Sarah Jensen |
| null |
+--------------+

In this case I included a space as a string literal. Without this space, the first and last names would be pushed up against each other, without a space in between.

We can include other string literals if required:

SELECT 
    'Employee: ' || first_name || ' ' || last_name AS employee_info
FROM employees;

Result:

+------------------------+
| employee_info |
+------------------------+
| Employee: Hector Ajit |
| Employee: Maria Swoosh |
| null |
| Employee: Sarah Jensen |
| null |
+------------------------+

Using the CONCAT() Function

The concat() function is another option for concatenating strings in DuckDB:

SELECT 
    CONCAT(first_name, ' ', last_name) AS full_name
FROM employees;

Result:

+--------------+
| full_name |
+--------------+
| Hector Ajit |
| Maria Swoosh |
| Muhammad |
| Sarah Jensen |
| Brattley |
+--------------+

Notice that there’s a difference in how the NULL values were handled when compared with the || operator example above.

Here’s an example that concatenates more values:

SELECT 
    CONCAT('Dept: ', department, ' - ', first_name, ' ', last_name) AS employee_info
FROM employees;

Result:

+---------------------------------+
| employee_info |
+---------------------------------+
| Dept: Engineering - Hector Ajit |
| Dept: Marketing - Maria Swoosh |
| Dept: Sales - Muhammad |
| Dept: - Sarah Jensen |
| Dept: HR - Brattley |
+---------------------------------+

Handling NULL Values

As alluded to, there’s a subtle difference between how the || operator and the concat() function handle NULL values. Let’s look at this in more detail.

NULL Behavior with the || Operator

When we used the || operator above, any time we tried to concatenate a NULL value with another value, the result was NULL. Here it is again:

SELECT 
    first_name,
    last_name,
    first_name || ' ' || last_name AS full_name
FROM employees;

Result:

+------------+-----------+--------------+
| first_name | last_name | full_name |
+------------+-----------+--------------+
| Hector | Ajit | Hector Ajit |
| Maria | Swoosh | Maria Swoosh |
| Muhammad | null | null |
| Sarah | Jensen | Sarah Jensen |
| null | Brattley | null |
+------------+-----------+--------------+

We can see that the third and fifth rows are NULL in the full_name column, even though these employees do actually have either a first name or last name (but not both).

Here’s one way to deal with this issue:

SELECT
    COALESCE(first_name || ' ' || last_name, 'Unknown') AS handled_null
FROM employees;

Result:

+--------------+
| handled_null |
+--------------+
| Hector Ajit |
| Maria Swoosh |
| Unknown |
| Sarah Jensen |
| Unknown |
+--------------+

In this case we used the coalesce() function to replace NULL with the string Unknown. This function returns the first non-NULL argument, and so Unknown is only returned whenever there’s a NULL value, otherwise the value itself is returned.

Here’s another way to deal with the issue:

SELECT 
    COALESCE(first_name, '') || ' ' || COALESCE(last_name, '') AS safe_full_name
FROM employees;

Result:

+----------------+
| safe_full_name |
+----------------+
| Hector Ajit |
| Maria Swoosh |
| Muhammad |
| Sarah Jensen |
| Brattley |
+----------------+

This time the result is the same as if we’d used the concat() function. When there’s a non-NULL value, that value is returned, even if it’s being concatenated with a NULL value. Therefore in this case, the third employee’s first name becomes the full name, and the fifth employee’s last name becomes the full name.

NULL Behavior with CONCAT()

When we use the concat() function to concatenate values, any NULLs are ignored, and non-NULL values are returned unchanged (or concatenated with another non-NULL value if there are multiple concatenations).

Here’s a comparison of using the concat() function vs the concatenation operator:

SELECT 
    CONCAT(first_name, ' ', last_name) AS concat_version,
    first_name || ' ' || last_name AS operator_version
FROM employees
WHERE first_name IS NULL OR last_name IS NULL;

Output:

+----------------+------------------+
| concat_version | operator_version |
+----------------+------------------+
| Brattley | null |
| Muhammad | null |
+----------------+------------------+

With concat(), any non-NULL values are returned, while with the concatenation operator, a single NULL value results in NULL being returned, even when the other value is not NULL.

However, as we saw in the previous example, we can use the coalesce() function with the concatenation operator in order to produce a result that’s more like what concat() would produce.

Using the CONCAT_WS() Function

The concat_ws() function can be handy whenever you need to concatenate multiple values with a separator in between. The way it works is that we pass the separator as the first argument, then all subsequent arguments will be concatenated using that separator.

For example, to use a hyphen as the separator:

SELECT 
    CONCAT_WS(' - ', department, first_name, last_name) AS employee_info
FROM employees;

Output:

+-----------------------------+
| employee_info |
+-----------------------------+
| Engineering - Hector - Ajit |
| Marketing - Maria - Swoosh |
| Sales - Muhammad |
| Sarah - Jensen |
| HR - Brattley |
+-----------------------------+

Here’s one that creates a comma separated list:

SELECT 
    CONCAT_WS(',', id::VARCHAR, first_name, last_name, department) AS csv_row
FROM employees;

Output:

+---------------------------+
| csv_row |
+---------------------------+
| 1,Hector,Ajit,Engineering |
| 2,Maria,Swoosh,Marketing |
| 3,Muhammad,Sales |
| 4,Sarah,Jensen |
| 5,Brattley,HR |
+---------------------------+

Some More Examples

Building Email Addresses

Here’s an example of using concatenation to build email addresses based on the employees’ full names:

SELECT 
    first_name,
    last_name,
    LOWER(
        CONCAT(
            COALESCE(first_name, ''),
            '.',
            COALESCE(last_name, ''),
            '@company.com'
        )
    ) AS generated_email
FROM employees;

Result:

+------------+-----------+--------------------------+
| first_name | last_name | generated_email |
+------------+-----------+--------------------------+
| Hector | Ajit | [email protected] |
| Maria | Swoosh | [email protected] |
| Muhammad | null | [email protected] |
| Sarah | Jensen | [email protected] |
| null | Brattley | [email protected] |
+------------+-----------+--------------------------+

Creating Full Names with Titles

SELECT 
    CASE 
        WHEN department = 'Engineering' THEN 'Eng.'
        WHEN department = 'Sales' THEN 'Sales Rep.'
        ELSE 'Staff'
    END || ' ' || COALESCE(first_name, '') || ' ' || COALESCE(last_name, '') AS titled_name
FROM employees;

Result:

+----------------------+
| titled_name |
+----------------------+
| Eng. Hector Ajit |
| Staff Maria Swoosh |
| Sales Rep. Muhammad |
| Staff Sarah Jensen |
| Staff Brattley |
+----------------------+

Building Dynamic SQL

Here’s an example that generates a dynamic SELECT statement:

-- Create a dynamic WHERE clause
SELECT 
    'SELECT * FROM employees WHERE ' || 
    STRING_AGG(
        column_name || ' IS NOT NULL',
        ' AND '
    ) AS dynamic_query
FROM information_schema.columns
WHERE table_name = 'employees'
LIMIT 3;

Output:

+----------------------------------------------------------------------------------------------------------------------------------------------------+
| dynamic_query |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT * FROM employees WHERE id IS NOT NULL AND first_name IS NOT NULL AND last_name IS NOT NULL AND department IS NOT NULL AND email IS NOT NULL |
+----------------------------------------------------------------------------------------------------------------------------------------------------+

In this case we queried the information schema to return data about the columns in the employees table.

Performance Considerations

If you’re dealing with large datasets, you’ll likely be mindful of performance. In such cases, you may find that performance changes depending on how you construct the query/concatenation operation.

For example, you might find that the following query:

SELECT 
    CONCAT(
        COALESCE(first_name, ''),
        ' ',
        COALESCE(last_name, ''),
        ' - ',
        COALESCE(department, '')
    ) AS efficient_concat
FROM employees;

Is more efficient than the following:

SELECT 
    COALESCE(first_name, '') || ' ' || 
    COALESCE(last_name, '') || ' - ' || 
    COALESCE(department, '') AS less_efficient
FROM employees;

Tips

Use concat() when dealing with NULL values:

-- Preferred
SELECT CONCAT(first_name, ' ', last_name) FROM employees;

-- Instead of
SELECT COALESCE(first_name, '') || ' ' || COALESCE(last_name, '') FROM employees;

Use concat_ws() for consistent separators:

-- Preferred
SELECT CONCAT_WS(', ', first_name, last_name, department) FROM employees;

-- Instead of
SELECT first_name || ', ' || last_name || ', ' || department FROM employees;

Use string_agg() for group concatenation:

SELECT 
    department,
    STRING_AGG(CONCAT(first_name, ' ', last_name), ', ') as team
FROM employees
GROUP BY department;

Summary

  • Use || for simple concatenations without NULLs
  • Use concat() when dealing with potential NULL values
  • Use concat_ws() for consistent separator-based concatenation
  • Use string_agg() for group-based concatenation
  • Always handle type conversions explicitly
  • Consider performance implications for large datasets

Remember to test your concatenation operations with NULL values and different data types to ensure robust behavior in your queries.