DuckDB is a fast and reliable analytical database that provides us with a wide range of aggregate functions that can help us perform analytical queries. One of these aggregate functions is the ARBITRARY()
function, which returns the first value from the group.
In this post, we’ll take a look at how the ARBITRARY()
function works in DuckDB
What is the ARBITRARY()
Function?
The ARBITRARY()
function in DuckDB is an aggregate function that returns the first value from a set of rows in a group. This includes null and non-null values. The result of ARBITRARY()
is influenced by the ordering within the group.
ARBITRARY()
can be useful when you want to fetch a representative value for each group without concern for which specific value it is or whether it’s null or not.
Syntax
ARBITRARY(expression)
Where expression
is the column or expression from which the first value is returned.
Example Scenarios
Below are some examples that demonstrate how the ARBITRARY()
function works in DuckDB.
Sample Data
The examples use the following data:
-- Create the employees table
CREATE TABLE employees (
department_id INTEGER,
employee_name TEXT,
salary INTEGER
);
-- Insert data into the employees table
INSERT INTO employees (department_id, employee_name, salary) VALUES
(1, 'Amber', 70000),
(1, 'Blake', 80000),
(2, 'Fitch', 90000),
(2, 'Sasha', NULL),
(3, 'Eve', NULL),
(3, 'Hector', 60000),
(4, NULL, 95000),
(4, 'Abe', 55000);
-- Select all data from the employees table
SELECT * FROM employees;
Output:
+---------------+---------------+--------+
| department_id | employee_name | salary |
+---------------+---------------+--------+
| 1 | Amber | 70000 |
| 1 | Blake | 80000 |
| 2 | Fitch | 90000 |
| 2 | Sasha | null |
| 3 | Eve | null |
| 3 | Hector | 60000 |
| 4 | null | 95000 |
| 4 | Abe | 55000 |
+---------------+---------------+--------+
This data contains some null values in order to demonstrate how ARBITRARY()
works with null values.
Basic Example
We want to retrieve an employee name from each department:
SELECT
department_id,
ARBITRARY(employee_name) AS example_employee
FROM employees
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 3 | Eve |
| 4 | null |
+---------------+------------------+
We can see that it returned the first employee from each group, even if it’s a null value. If you only want to return non-null values, use ANY_VALUE()
instead.
With Ordering
If we introduce an explicit ordering, ARBITRARY()
will return the first value according to that order:
SELECT
department_id,
ARBITRARY(employee_name ORDER BY salary DESC) AS example_employee
FROM employees
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Blake |
| 2 | Fitch |
| 3 | Hector |
| 4 | null |
+---------------+------------------+
Example on the Salary Column
Let’s run the ARBITRARY()
function on the salary
column:
SELECT
department_id,
ARBITRARY(salary) AS example_salary
FROM employees
GROUP BY department_id;
Output:
+---------------+----------------+
| department_id | example_salary |
+---------------+----------------+
| 1 | 70000 |
| 2 | 90000 |
| 3 | null |
| 4 | 95000 |
+---------------+----------------+
As expected, it returned the first salary, including the null one.
Adding a WHERE
Clause
Here’s an example of a WHERE
clause applied against a column that’s not being grouped:
SELECT
department_id,
ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE salary > 60000
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 2 | Fitch |
| 4 | null |
+---------------+------------------+
And here’s a WHERE
clause applied to the grouped column:
SELECT
department_id,
ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE department_id IN (1, 3)
GROUP BY department_id;
Output:
+---------------+------------------+
| department_id | example_employee |
+---------------+------------------+
| 1 | Amber |
| 3 | Eve |
+---------------+------------------+
If no rows match the WHERE
clause, then no rows are returned:
SELECT
department_id,
ARBITRARY(employee_name) AS example_employee
FROM employees
WHERE department_id = 5
GROUP BY department_id;
Output:
That’s intentionally blank, due to the fact that no results were returned.
Synonym
ARBITRARY()
is a synonym for FIRST()
, so you can use either one to achieve the same outcome.