DuckDB is a fast and reliable analytical database that offers a bunch of aggregate functions. One such function is ANY_VALUE()
.
In this post, we’ll explore the ANY_VALUE()
function, along with examples to demonstrate its usage.
What is ANY_VALUE()
?
The ANY_VALUE()
function in DuckDB is an aggregate function that returns the first non-null value from a set of rows in a group. The result of ANY_VALUE()
is influenced by the ordering within the group.
ANY_VALUE()
can be useful when you want to fetch a representative value for each group without concern for which specific value it is, as long as it is non-null.
Syntax
ANY_VALUE(expression)
Where expression
is the column or expression from which the first non-null value is returned.
Example Scenarios
Below are some examples that demonstrate how the ANY_VALUE()
function works in DuckDB.
Sample Data
The examples use the following data:
-- Create the employees table
CREATE TABLE employees (
department_id INTEGER,
employee_name TEXT,
salary INTEGER
);
-- Insert data into the employees table
INSERT INTO employees (department_id, employee_name, salary) VALUES
(1, 'Amber', 70000),
(1, 'Blake', 80000),
(2, 'Fitch', 90000),
(2, 'Sasha', NULL),
(3, 'Eve', NULL),
(3, 'Hector', 60000),
(4, NULL, 95000),
(4, 'Abe', 55000);
-- Select all data from the employees table
SELECT * FROM employees;
Output:
+---------------+---------------+--------+
| department_id | employee_name | salary |
+---------------+---------------+--------+
| 1 | Amber | 70000 |
| 1 | Blake | 80000 |
| 2 | Fitch | 90000 |
| 2 | Sasha | null |
| 3 | Eve | null |
| 3 | Hector | 60000 |
| 4 | null | 95000 |
| 4 | Abe | 55000 |
+---------------+---------------+--------+
This data contains some null values in order to demonstrate how ANY_VALUE()
works.
Basic Example
We want to retrieve a non-null employee name from each department:
SELECT
department_id,
ANY_VALUE(employee_name) AS example_employee
FROM employees
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 3 | Eve |
| 4 | Abe |
+---------------+------------------+
We can see that it returned the first non-null employee from each group. Given the first employee name for department 4 is null, it skipped that value and returned Abe, which is the first non-null value for that department.
With Ordering
If we introduce an explicit ordering, ANY_VALUE()
will return the first non-null value according to that order:
SELECT
department_id,
ANY_VALUE(employee_name ORDER BY salary DESC) AS example_employee
FROM employees
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Blake |
| 2 | Fitch |
| 3 | Hector |
| 4 | Abe |
+---------------+------------------+
Example on the Salary Column
Let’s run the ANY_VALUE()
function on the salary column:
SELECT
department_id,
ANY_VALUE(salary) AS example_salary
FROM employees
GROUP BY department_id;
Output:
+---------------+----------------+
| department_id | example_salary |
+---------------+----------------+
| 1 | 70000 |
| 2 | 90000 |
| 3 | 60000 |
| 4 | 95000 |
+---------------+----------------+
As expected, it returned the first non-null salary (i.e. it skipped the null values).
Adding a WHERE
Clause
If we apply a WHERE
clause against a column that’s not being grouped, we can end up with null values from the ANY_VALUE()
function:
SELECT
department_id,
ANY_VALUE(employee_name) AS example_employee
FROM employees
WHERE salary > 60000
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 4 | null |
+---------------+------------------+
If we apply the WHERE
clause to the grouped column and there are no matches, then we don’t get any results:
SELECT
department_id,
ANY_VALUE(employee_name) AS example_employee
FROM employees
WHERE department_id = 5
GROUP BY department_id;
Output:
That’s intentionally blank, due to the fact that no results were returned.