NULL values in databases can be tricky to handle and often lead to unexpected results. DuckDB’s COALESCE()
function provides an easy way to manage NULL values effectively.
In this article, we’ll explore the COALESCE()
function in DuckDB with some practical examples.
How COALESCE()
Works
COALESCE()
evaluates arguments in order and returns the first non-NULL value. If all arguments are NULL, COALESCE()
returns NULL.
The syntax goes like this:
COALESCE(value1, value2, ..., valueN)
So basically, we can pass multiple values to the function, and it will return the first non-NULL value.
Let’s run some examples to demonstrate this.
Quick Example
Here’s a quick example to demonstrate how it works:
SELECT COALESCE( null, null, 'Cashflow', null );
Result:
+----------------------------------------+
| COALESCE(NULL, NULL, 'Cashflow', NULL) |
+----------------------------------------+
| Cashflow |
+----------------------------------------+
Here, we passed four arguments to COALESCE()
, and it returned the first one that wasn’t NULL.
When All Values are NULL
If all values are NULL, then NULL is returned:
SELECT COALESCE( null, null, null, null );
Result:
+----------------------------------+
| COALESCE(NULL, NULL, NULL, NULL) |
+----------------------------------+
| |
+----------------------------------+
By default, NULL values are returned as an empty string. We can use the .nullvalue
dot command to specify another value.
Let’s do that:
.nullvalue 'null'
SELECT COALESCE( null, null, null, null );
Result:
+----------------------------------+
| COALESCE(NULL, NULL, NULL, NULL) |
+----------------------------------+
| null |
+----------------------------------+
Database Examples
Let’s take it a step further and use the COALESCE()
function against some database queries.
Setting Up Example Data
First, let’s create a table with some NULL values:
-- Create a table for an online store's orders
CREATE TABLE orders (
order_id INTEGER,
customer_id INTEGER,
product_name VARCHAR,
quantity INTEGER,
unit_price DECIMAL(10,2),
discount DECIMAL(10,2),
notes VARCHAR
);
-- Insert sample data with NULL values
INSERT INTO orders VALUES
(1, 101, 'Laptop', 1, 999.99, NULL, 'Priority shipping'),
(2, 102, 'Mouse', 2, 29.99, 5.00, NULL),
(3, 103, 'Keyboard', NULL, 89.99, 10.00, 'Back-ordered'),
(4, 104, 'Monitor', 1, NULL, NULL, NULL),
(5, 105, 'Headphones', 2, 159.99, NULL, 'Gift wrap');
Now that we’ve got a table with sample data, let’s run some queries against it. We’ll incorporate COALESCE()
into the queries in order to deal with any NULL values.
Handling Missing Discounts
Replace NULL discounts with 0
:
SELECT
order_id,
product_name,
unit_price,
discount AS original_discount,
COALESCE(discount, 0) AS applied_discount
FROM orders;
Result:
+----------+--------------+------------+-------------------+------------------+
| order_id | product_name | unit_price | original_discount | applied_discount |
+----------+--------------+------------+-------------------+------------------+
| 1 | Laptop | 999.99 | null | 0.00 |
| 2 | Mouse | 29.99 | 5.00 | 5.00 |
| 3 | Keyboard | 89.99 | 10.00 | 10.00 |
| 4 | Monitor | null | null | 0.00 |
| 5 | Headphones | 159.99 | null | 0.00 |
+----------+--------------+------------+-------------------+------------------+
In this example, the original_discount
column contains the raw value (including NULL values), and the applied_discount
column uses the same column, but with COALESCE()
applied in order to change the NULL values into zero.
Multiple Fallback Values
Here’s an example that uses three arguments (with two fallback arguments):
SELECT
order_id,
product_name,
notes AS original_notes,
COALESCE(notes, 'Standard shipping', 'No special instructions') as order_notes
FROM orders;
Result:
+----------+--------------+-------------------+-------------------+
| order_id | product_name | original_notes | order_notes |
+----------+--------------+-------------------+-------------------+
| 1 | Laptop | Priority shipping | Priority shipping |
| 2 | Mouse | null | Standard shipping |
| 3 | Keyboard | Back-ordered | Back-ordered |
| 4 | Monitor | null | Standard shipping |
| 5 | Headphones | Gift wrap | Gift wrap |
+----------+--------------+-------------------+-------------------+
Here, although we’ve got a third argument (No special instructions
), that one is never used, due to the fact that the second argument is a non-NULL string literal. This causes the second argument to always be used whenever the first argument contains a NULL value. However, if our second argument was a column name, then perhaps the third argument would be warranted, due to the possibility that the second argument may contain NULL values.
Calculating Total Price
Calculate final price considering NULL quantities and discounts:
SELECT
order_id,
product_name,
COALESCE(quantity, 1) as adjusted_quantity,
unit_price,
COALESCE(discount, 0) as applied_discount,
COALESCE(quantity, 1) * unit_price - COALESCE(discount, 0) as final_price
FROM orders
WHERE unit_price IS NOT NULL;
Result:
+----------+--------------+-------------------+------------+------------------+-------------+
| order_id | product_name | adjusted_quantity | unit_price | applied_discount | final_price |
+----------+--------------+-------------------+------------+------------------+-------------+
| 1 | Laptop | 1 | 999.99 | 0.00 | 999.99 |
| 2 | Mouse | 2 | 29.99 | 5.00 | 54.98 |
| 3 | Keyboard | 1 | 89.99 | 10.00 | 79.99 |
| 5 | Headphones | 2 | 159.99 | 0.00 | 319.98 |
+----------+--------------+-------------------+------------+------------------+-------------+
Combining Multiple Columns
Create a description using multiple columns:
SELECT
order_id,
COALESCE(
notes,
product_name || ' - ' || CAST(quantity as VARCHAR),
'Order #' || CAST(order_id as VARCHAR)
) as order_description
FROM orders;
Result:
+----------+-------------------+
| order_id | order_description |
+----------+-------------------+
| 1 | Priority shipping |
| 2 | Mouse - 2 |
| 3 | Back-ordered |
| 4 | Monitor - 1 |
| 5 | Gift wrap |
+----------+-------------------+
COALESCE()
in Aggregate Functions
Calculate average discount, treating NULL as 0
:
SELECT
product_name,
AVG(COALESCE(discount, 0)) as avg_discount,
COUNT(*) as total_orders,
COUNT(discount) as orders_with_discount
FROM orders
GROUP BY product_name;
Result:
+--------------+--------------+--------------+----------------------+
| product_name | avg_discount | total_orders | orders_with_discount |
+--------------+--------------+--------------+----------------------+
| Keyboard | 10.0 | 1 | 1 |
| Mouse | 5.0 | 1 | 1 |
| Headphones | 0.0 | 1 | 0 |
| Laptop | 0.0 | 1 | 0 |
| Monitor | 0.0 | 1 | 0 |
+--------------+--------------+--------------+----------------------+
Conditional Logic with COALESCE()
Create status messages based on various fields:
SELECT
order_id,
product_name,
CASE
WHEN quantity IS NULL THEN 'Quantity not specified'
WHEN unit_price IS NULL THEN 'Price pending'
ELSE COALESCE(notes, 'Processing')
END as order_status
FROM orders;
Result:
+----------+--------------+------------------------+
| order_id | product_name | order_status |
+----------+--------------+------------------------+
| 1 | Laptop | Priority shipping |
| 2 | Mouse | Processing |
| 3 | Keyboard | Quantity not specified |
| 4 | Monitor | Price pending |
| 5 | Headphones | Gift wrap |
+----------+--------------+------------------------+
Common Pitfalls to Avoid
While COALESCE()
provides a simple way to deal with NULL values, we need to be careful not to allow this simplicity to steer us into a false sense of security. We might be tempted to quickly apply COALESCE()
to a value, without really looking at the whole expression.
In particular, below are two situations to be mindful of.
Null Propagation
If we’re not careful, we can inadvertently allow NULL values to propagate when performing calculations.
Incorrect (NULL will propagate):
SELECT
order_id,
quantity * COALESCE(unit_price, 0) as total
FROM orders;
Output:
+----------+--------+
| order_id | total |
+----------+--------+
| 1 | 999.99 |
| 2 | 59.98 |
| 3 | null |
| 4 | 0.00 |
| 5 | 319.98 |
+----------+--------+
We can see that a NULL value continues to appear in the total column, even though we tried to deal with it with COALESCE()
. This is because we only used COALESCE()
against one column. We should’ve applied it to both columns in the calculation.
Correct (handles both NULL cases):
SELECT
order_id,
COALESCE(quantity, 0) * COALESCE(unit_price, 0) as total
FROM orders;
Result:
+----------+--------+
| order_id | total |
+----------+--------+
| 1 | 999.99 |
| 2 | 59.98 |
| 3 | 0.00 |
| 4 | 0.00 |
| 5 | 319.98 |
+----------+--------+
This time we get the correct result because we applied the COALESCE()
function to both columns (in order to handle NULL values in both columns).
Type Mismatch
Another potential issue you might encounter is a type mismatch error. This can occur when the values we pass to COALESCE()
are incompatible.
Incorrect (type mismatch):
SELECT COALESCE(discount, 'none') FROM orders;
Result:
Conversion Error: Could not convert string "none" to DECIMAL(10,2)
LINE 1: SELECT COALESCE(discount, 'none') FROM orders;
^
This error occurred because the first column is a DECIMAL
type, but our second value is a string. When DuckDB tried to use the second value, it couldn’t convert it to DECIMAL(10,2)
.
Correct (consistent types):
SELECT COALESCE(CAST(discount as VARCHAR), 'none') FROM orders;
Result:
+---------------------------------------------+
| COALESCE(CAST(discount AS VARCHAR), 'none') |
+---------------------------------------------+
| none |
| 5.00 |
| 10.00 |
| none |
| none |
+---------------------------------------------+
If we must use a string as a fallback value for a DECIMAL
column, then we can convert the DECIMAL
column to a string value. We did that in this example and we no longer get an error.
Summary
COALESCE()
is a handy function for handling NULL values in DuckDB. Key takeaways:
- Always consider type consistency when using
COALESCE()
- Use meaningful default values that make sense for your data
- Consider performance implications when nesting
COALESCE()
functions - Remember that
COALESCE()
evaluates arguments in order - Use
COALESCE()
with aggregate functions for more accurate analysis
Remember to test your COALESCE()
implementations thoroughly, especially when dealing with calculations or concatenations that might propagate NULL values unexpectedly.