How to Use COALESCE() to Handle NULL Values in DuckDB

NULL values in databases can be tricky to handle and often lead to unexpected results. DuckDB’s COALESCE() function provides an easy way to manage NULL values effectively.

In this article, we’ll explore the COALESCE() function in DuckDB with some practical examples.

How COALESCE() Works

COALESCE() evaluates arguments in order and returns the first non-NULL value. If all arguments are NULL, COALESCE() returns NULL.

The syntax goes like this:

COALESCE(value1, value2, ..., valueN)

So basically, we can pass multiple values to the function, and it will return the first non-NULL value.

Let’s run some examples to demonstrate this.

Quick Example

Here’s a quick example to demonstrate how it works:

SELECT COALESCE( null, null, 'Cashflow', null );

Result:

+----------------------------------------+
| COALESCE(NULL, NULL, 'Cashflow', NULL) |
+----------------------------------------+
| Cashflow |
+----------------------------------------+

Here, we passed four arguments to COALESCE(), and it returned the first one that wasn’t NULL.

When All Values are NULL

If all values are NULL, then NULL is returned:

SELECT COALESCE( null, null, null, null );

Result:

+----------------------------------+
| COALESCE(NULL, NULL, NULL, NULL) |
+----------------------------------+
| |
+----------------------------------+

By default, NULL values are returned as an empty string. We can use the .nullvalue dot command to specify another value.

Let’s do that:

.nullvalue 'null'
SELECT COALESCE( null, null, null, null );

Result:

+----------------------------------+
| COALESCE(NULL, NULL, NULL, NULL) |
+----------------------------------+
| null |
+----------------------------------+

Database Examples

Let’s take it a step further and use the COALESCE() function against some database queries.

Setting Up Example Data

First, let’s create a table with some NULL values:

-- Create a table for an online store's orders
CREATE TABLE orders (
    order_id INTEGER,
    customer_id INTEGER,
    product_name VARCHAR,
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    discount DECIMAL(10,2),
    notes VARCHAR
);

-- Insert sample data with NULL values
INSERT INTO orders VALUES
    (1, 101, 'Laptop', 1, 999.99, NULL, 'Priority shipping'),
    (2, 102, 'Mouse', 2, 29.99, 5.00, NULL),
    (3, 103, 'Keyboard', NULL, 89.99, 10.00, 'Back-ordered'),
    (4, 104, 'Monitor', 1, NULL, NULL, NULL),
    (5, 105, 'Headphones', 2, 159.99, NULL, 'Gift wrap');

Now that we’ve got a table with sample data, let’s run some queries against it. We’ll incorporate COALESCE() into the queries in order to deal with any NULL values.

Handling Missing Discounts

Replace NULL discounts with 0:

SELECT 
    order_id,
    product_name,
    unit_price,
    discount AS original_discount,
    COALESCE(discount, 0) AS applied_discount
FROM orders;

Result:

+----------+--------------+------------+-------------------+------------------+
| order_id | product_name | unit_price | original_discount | applied_discount |
+----------+--------------+------------+-------------------+------------------+
| 1 | Laptop | 999.99 | null | 0.00 |
| 2 | Mouse | 29.99 | 5.00 | 5.00 |
| 3 | Keyboard | 89.99 | 10.00 | 10.00 |
| 4 | Monitor | null | null | 0.00 |
| 5 | Headphones | 159.99 | null | 0.00 |
+----------+--------------+------------+-------------------+------------------+

In this example, the original_discount column contains the raw value (including NULL values), and the applied_discount column uses the same column, but with COALESCE() applied in order to change the NULL values into zero.

Multiple Fallback Values

Here’s an example that uses three arguments (with two fallback arguments):

SELECT 
    order_id,
    product_name,
    notes AS original_notes,
    COALESCE(notes, 'Standard shipping', 'No special instructions') as order_notes
FROM orders;

Result:

+----------+--------------+-------------------+-------------------+
| order_id | product_name | original_notes | order_notes |
+----------+--------------+-------------------+-------------------+
| 1 | Laptop | Priority shipping | Priority shipping |
| 2 | Mouse | null | Standard shipping |
| 3 | Keyboard | Back-ordered | Back-ordered |
| 4 | Monitor | null | Standard shipping |
| 5 | Headphones | Gift wrap | Gift wrap |
+----------+--------------+-------------------+-------------------+

Here, although we’ve got a third argument (No special instructions), that one is never used, due to the fact that the second argument is a non-NULL string literal. This causes the second argument to always be used whenever the first argument contains a NULL value. However, if our second argument was a column name, then perhaps the third argument would be warranted, due to the possibility that the second argument may contain NULL values.

Calculating Total Price

Calculate final price considering NULL quantities and discounts:

SELECT 
    order_id,
    product_name,
    COALESCE(quantity, 1) as adjusted_quantity,
    unit_price,
    COALESCE(discount, 0) as applied_discount,
    COALESCE(quantity, 1) * unit_price - COALESCE(discount, 0) as final_price
FROM orders
WHERE unit_price IS NOT NULL;

Result:

+----------+--------------+-------------------+------------+------------------+-------------+
| order_id | product_name | adjusted_quantity | unit_price | applied_discount | final_price |
+----------+--------------+-------------------+------------+------------------+-------------+
| 1 | Laptop | 1 | 999.99 | 0.00 | 999.99 |
| 2 | Mouse | 2 | 29.99 | 5.00 | 54.98 |
| 3 | Keyboard | 1 | 89.99 | 10.00 | 79.99 |
| 5 | Headphones | 2 | 159.99 | 0.00 | 319.98 |
+----------+--------------+-------------------+------------+------------------+-------------+

Combining Multiple Columns

Create a description using multiple columns:

SELECT 
    order_id,
    COALESCE(
        notes,
        product_name || ' - ' || CAST(quantity as VARCHAR),
        'Order #' || CAST(order_id as VARCHAR)
    ) as order_description
FROM orders;

Result:

+----------+-------------------+
| order_id | order_description |
+----------+-------------------+
| 1 | Priority shipping |
| 2 | Mouse - 2 |
| 3 | Back-ordered |
| 4 | Monitor - 1 |
| 5 | Gift wrap |
+----------+-------------------+

COALESCE() in Aggregate Functions

Calculate average discount, treating NULL as 0:

SELECT 
    product_name,
    AVG(COALESCE(discount, 0)) as avg_discount,
    COUNT(*) as total_orders,
    COUNT(discount) as orders_with_discount
FROM orders
GROUP BY product_name;

Result:

+--------------+--------------+--------------+----------------------+
| product_name | avg_discount | total_orders | orders_with_discount |
+--------------+--------------+--------------+----------------------+
| Keyboard | 10.0 | 1 | 1 |
| Mouse | 5.0 | 1 | 1 |
| Headphones | 0.0 | 1 | 0 |
| Laptop | 0.0 | 1 | 0 |
| Monitor | 0.0 | 1 | 0 |
+--------------+--------------+--------------+----------------------+

Conditional Logic with COALESCE()

Create status messages based on various fields:

SELECT 
    order_id,
    product_name,
    CASE 
        WHEN quantity IS NULL THEN 'Quantity not specified'
        WHEN unit_price IS NULL THEN 'Price pending'
        ELSE COALESCE(notes, 'Processing')
    END as order_status
FROM orders;

Result:

+----------+--------------+------------------------+
| order_id | product_name | order_status |
+----------+--------------+------------------------+
| 1 | Laptop | Priority shipping |
| 2 | Mouse | Processing |
| 3 | Keyboard | Quantity not specified |
| 4 | Monitor | Price pending |
| 5 | Headphones | Gift wrap |
+----------+--------------+------------------------+

Common Pitfalls to Avoid

While COALESCE() provides a simple way to deal with NULL values, we need to be careful not to allow this simplicity to steer us into a false sense of security. We might be tempted to quickly apply COALESCE() to a value, without really looking at the whole expression.

In particular, below are two situations to be mindful of.

Null Propagation

If we’re not careful, we can inadvertently allow NULL values to propagate when performing calculations.

Incorrect (NULL will propagate):

SELECT 
    order_id, 
    quantity * COALESCE(unit_price, 0) as total
FROM orders;

Output:

+----------+--------+
| order_id | total |
+----------+--------+
| 1 | 999.99 |
| 2 | 59.98 |
| 3 | null |
| 4 | 0.00 |
| 5 | 319.98 |
+----------+--------+

We can see that a NULL value continues to appear in the total column, even though we tried to deal with it with COALESCE(). This is because we only used COALESCE() against one column. We should’ve applied it to both columns in the calculation.

Correct (handles both NULL cases):

SELECT 
    order_id, 
    COALESCE(quantity, 0) * COALESCE(unit_price, 0) as total
FROM orders;

Result:

+----------+--------+
| order_id | total |
+----------+--------+
| 1 | 999.99 |
| 2 | 59.98 |
| 3 | 0.00 |
| 4 | 0.00 |
| 5 | 319.98 |
+----------+--------+

This time we get the correct result because we applied the COALESCE() function to both columns (in order to handle NULL values in both columns).

Type Mismatch

Another potential issue you might encounter is a type mismatch error. This can occur when the values we pass to COALESCE() are incompatible.

Incorrect (type mismatch):

SELECT COALESCE(discount, 'none') FROM orders;

Result:

Conversion Error: Could not convert string "none" to DECIMAL(10,2)
LINE 1: SELECT COALESCE(discount, 'none') FROM orders;
^

This error occurred because the first column is a DECIMAL type, but our second value is a string. When DuckDB tried to use the second value, it couldn’t convert it to DECIMAL(10,2).

Correct (consistent types):

SELECT COALESCE(CAST(discount as VARCHAR), 'none') FROM orders;

Result:

+---------------------------------------------+
| COALESCE(CAST(discount AS VARCHAR), 'none') |
+---------------------------------------------+
| none |
| 5.00 |
| 10.00 |
| none |
| none |
+---------------------------------------------+

If we must use a string as a fallback value for a DECIMAL column, then we can convert the DECIMAL column to a string value. We did that in this example and we no longer get an error.

Summary

COALESCE() is a handy function for handling NULL values in DuckDB. Key takeaways:

  • Always consider type consistency when using COALESCE()
  • Use meaningful default values that make sense for your data
  • Consider performance implications when nesting COALESCE() functions
  • Remember that COALESCE() evaluates arguments in order
  • Use COALESCE() with aggregate functions for more accurate analysis

Remember to test your COALESCE() implementations thoroughly, especially when dealing with calculations or concatenations that might propagate NULL values unexpectedly.