Exploring the ANY_VALUE() Function in DuckDB

DuckDB is a fast and reliable analytical database that offers a bunch of aggregate functions. One such function is ANY_VALUE().

In this post, we’ll explore the ANY_VALUE() function, along with examples to demonstrate its usage.

What is ANY_VALUE()?

The ANY_VALUE() function in DuckDB is an aggregate function that returns the first non-null value from a set of rows in a group. The result of ANY_VALUE() is influenced by the ordering within the group.

ANY_VALUE() can be useful when you want to fetch a representative value for each group without concern for which specific value it is, as long as it is non-null.

Syntax

ANY_VALUE(expression)

Where expression is the column or expression from which the first non-null value is returned.

Example Scenarios

Below are some examples that demonstrate how the ANY_VALUE() function works in DuckDB.

Sample Data

The examples use the following data:

-- Create the employees table
CREATE TABLE employees (
    department_id INTEGER,
    employee_name TEXT,
    salary INTEGER
);

-- Insert data into the employees table
INSERT INTO employees (department_id, employee_name, salary) VALUES
(1, 'Amber', 70000),
(1, 'Blake', 80000),
(2, 'Fitch', 90000),
(2, 'Sasha', NULL),
(3, 'Eve', NULL),
(3, 'Hector', 60000),
(4, NULL, 95000),
(4, 'Abe', 55000);

-- Select all data from the employees table
SELECT * FROM employees;

Output:

+---------------+---------------+--------+
| department_id | employee_name | salary |
+---------------+---------------+--------+
| 1 | Amber | 70000 |
| 1 | Blake | 80000 |
| 2 | Fitch | 90000 |
| 2 | Sasha | null |
| 3 | Eve | null |
| 3 | Hector | 60000 |
| 4 | null | 95000 |
| 4 | Abe | 55000 |
+---------------+---------------+--------+

This data contains some null values in order to demonstrate how ANY_VALUE() works.

Basic Example

We want to retrieve a non-null employee name from each department:

SELECT 
    department_id, 
    ANY_VALUE(employee_name) AS example_employee
FROM employees
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 3 | Eve |
| 4 | Abe |
+---------------+------------------+

We can see that it returned the first non-null employee from each group. Given the first employee name for department 4 is null, it skipped that value and returned Abe, which is the first non-null value for that department.

With Ordering

If we introduce an explicit ordering, ANY_VALUE() will return the first non-null value according to that order:

SELECT 
    department_id, 
    ANY_VALUE(employee_name ORDER BY salary DESC) AS example_employee
FROM employees
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Blake |
| 2 | Fitch |
| 3 | Hector |
| 4 | Abe |
+---------------+------------------+

Example on the Salary Column

Let’s run the ANY_VALUE() function on the salary column:

SELECT 
    department_id, 
    ANY_VALUE(salary) AS example_salary
FROM employees
GROUP BY department_id;

Output:

+---------------+----------------+
| department_id | example_salary |
+---------------+----------------+
| 1 | 70000 |
| 2 | 90000 |
| 3 | 60000 |
| 4 | 95000 |
+---------------+----------------+

As expected, it returned the first non-null salary (i.e. it skipped the null values).

Adding a WHERE Clause

If we apply a WHERE clause against a column that’s not being grouped, we can end up with null values from the ANY_VALUE() function:

SELECT 
    department_id, 
    ANY_VALUE(employee_name) AS example_employee
FROM employees
WHERE salary > 60000
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 4 | null |
+---------------+------------------+

If we apply the WHERE clause to the grouped column and there are no matches, then we don’t get any results:

SELECT 
    department_id, 
    ANY_VALUE(employee_name) AS example_employee
FROM employees
WHERE department_id = 5
GROUP BY department_id;

Output:





That’s intentionally blank, due to the fact that no results were returned.