Understanding the ARBITRARY() Function in DuckDB

DuckDB is a fast and reliable analytical database that provides us with a wide range of aggregate functions that can help us perform analytical queries. One of these aggregate functions is the ARBITRARY() function, which returns the first value from the group.

In this post, we’ll take a look at how the ARBITRARY() function works in DuckDB

What is the ARBITRARY() Function?

The ARBITRARY() function in DuckDB is an aggregate function that returns the first value from a set of rows in a group. This includes null and non-null values. The result of ARBITRARY() is influenced by the ordering within the group.

ARBITRARY() can be useful when you want to fetch a representative value for each group without concern for which specific value it is or whether it’s null or not.

Syntax

ARBITRARY(expression)

Where expression is the column or expression from which the first value is returned.

Example Scenarios

Below are some examples that demonstrate how the ARBITRARY() function works in DuckDB.

Sample Data

The examples use the following data:

-- Create the employees table
CREATE TABLE employees (
    department_id INTEGER,
    employee_name TEXT,
    salary INTEGER
);

-- Insert data into the employees table
INSERT INTO employees (department_id, employee_name, salary) VALUES
(1, 'Amber', 70000),
(1, 'Blake', 80000),
(2, 'Fitch', 90000),
(2, 'Sasha', NULL),
(3, 'Eve', NULL),
(3, 'Hector', 60000),
(4, NULL, 95000),
(4, 'Abe', 55000);

-- Select all data from the employees table
SELECT * FROM employees;

Output:

+---------------+---------------+--------+
| department_id | employee_name | salary |
+---------------+---------------+--------+
| 1 | Amber | 70000 |
| 1 | Blake | 80000 |
| 2 | Fitch | 90000 |
| 2 | Sasha | null |
| 3 | Eve | null |
| 3 | Hector | 60000 |
| 4 | null | 95000 |
| 4 | Abe | 55000 |
+---------------+---------------+--------+

This data contains some null values in order to demonstrate how ARBITRARY() works with null values.

Basic Example

We want to retrieve an employee name from each department:

SELECT 
    department_id, 
    ARBITRARY(employee_name) AS example_employee
FROM employees
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 3 | Eve |
| 4 | null |
+---------------+------------------+

We can see that it returned the first employee from each group, even if it’s a null value. If you only want to return non-null values, use ANY_VALUE() instead.

With Ordering

If we introduce an explicit ordering, ARBITRARY() will return the first value according to that order:

SELECT 
    department_id, 
    ARBITRARY(employee_name ORDER BY salary DESC) AS example_employee
FROM employees
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Blake |
| 2 | Fitch |
| 3 | Hector |
| 4 | null |
+---------------+------------------+

Example on the Salary Column

Let’s run the ARBITRARY() function on the salary column:

SELECT 
    department_id, 
    ARBITRARY(salary) AS example_salary
FROM employees
GROUP BY department_id;

Output:

+---------------+----------------+
| department_id | example_salary |
+---------------+----------------+
| 1 | 70000 |
| 2 | 90000 |
| 3 | null |
| 4 | 95000 |
+---------------+----------------+

As expected, it returned the first salary, including the null one.

Adding a WHERE Clause

Here’s an example of a WHERE clause applied against a column that’s not being grouped:

SELECT 
    department_id, 
    ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE salary > 60000
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 4 | null |
+---------------+------------------+

And here’s a WHERE clause applied to the grouped column:

SELECT 
    department_id, 
    ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE department_id IN (1, 3)
GROUP BY department_id;

Output:

+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 3 | Eve |
+---------------+------------------+

If no rows match the WHERE clause, then no rows are returned:

SELECT 
    department_id, 
    ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE department_id = 5
GROUP BY department_id;

Output:





That’s intentionally blank, due to the fact that no results were returned.

Synonym

ARBITRARY() is a synonym for FIRST(), so you can use either one to achieve the same outcome.