Exploring the FIRST() Function in DuckDB

One of the aggregate functions available in DuckDB is the FIRST() function, which returns the first value from each group in a query.

Let’s take a look at some examples of using the FIRST() function in DuckDB

What is the FIRST() Function?

In DuckDB, FIRST() is an aggregate function that returns the first value from a set of rows in a group. This includes null and non-null values. The result of FIRST() is influenced by the ordering within the group.

FIRST() can be useful when you want to fetch a representative value for each group without concern for which specific value it is or whether it’s null or not.

FIRST() is a synonym for ARBITRARY(), so we can use either function to achieve the same outcome.

Syntax

FIRST(expression)

Where expression is the column or expression from which the first value is returned.

Example Scenarios

Below are some examples that demonstrate how the FIRST() function works in DuckDB.

Sample Data

The examples use the following data:

-- Create the sales table
CREATE TABLE sales (
    region_id INTEGER,
    product_name TEXT,
    revenue INTEGER,
    discount INTEGER
);

-- Insert data into the sales table
INSERT INTO sales (region_id, product_name, revenue, discount) VALUES
(1, 'Widget A', 1000, 50),
(1, 'Widget B', 1500, NULL),
(2, 'Gadget C', 2000, 100),
(2, 'Gadget D', NULL, NULL),
(3, 'Thingamajig', 1200, 30),
(3, 'Widget E', 1300, 40),
(4, 'Gizmo F', NULL, 20),
(4, 'Gizmo G', 1100, NULL);

-- Select everything from the sales table
SELECT * FROM sales;

Output:

+-----------+--------------+---------+----------+
| region_id | product_name | revenue | discount |
+-----------+--------------+---------+----------+
| 1 | Widget A | 1000 | 50 |
| 1 | Widget B | 1500 | null |
| 2 | Gadget C | 2000 | 100 |
| 2 | Gadget D | null | null |
| 3 | Thingamajig | 1200 | 30 |
| 3 | Widget E | 1300 | 40 |
| 4 | Gizmo F | null | 20 |
| 4 | Gizmo G | 1100 | null |
+-----------+--------------+---------+----------+

This data contains some null values in order to demonstrate how FIRST() works with null values.

Basic Example

We want to retrieve an employee name from each department:

SELECT 
    region_id, 
    FIRST(product_name) AS first_product
FROM sales
GROUP BY region_id;

Output:

+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget A |
| 2 | Gadget C |
| 3 | Thingamajig |
| 4 | Gizmo F |
+-----------+---------------+

We can see that it returned the first employee from each group, even if it’s a null value. If you only want to return non-null values, use ANY_VALUE() instead.

With Ordering

When we add an ORDER BY clause to the function, the FIRST() function returns the first value according to that order:

SELECT 
    region_id, 
    FIRST(product_name ORDER BY revenue DESC) AS first_product
FROM sales
GROUP BY region_id;

Output:

+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget B |
| 2 | Gadget C |
| 3 | Widget E |
| 4 | Gizmo G |
+-----------+---------------+

Handling NULL Values

As mentioned, FIRST() returns the first value whether it’s null or not.

Let’s run FIRST() on the revenue column:

SELECT 
    region_id, 
    FIRST(revenue)
FROM sales
GROUP BY region_id;

Output:

+-----------+------------------+
| region_id | "first"(revenue) |
+-----------+------------------+
| 1 | 1000 |
| 2 | 2000 |
| 3 | 1200 |
| 4 | null |
+-----------+------------------+

We can see that null was returned in the fourth column. This is in contrast to what the ANY_VALUE() function would return, as it only returns non-null values. If you don’t want null values returned, use that function instead.

Adding a WHERE Clause

Here’s an example of a WHERE clause applied against a column that’s not being grouped:

SELECT 
    region_id, 
    FIRST(product_name) AS first_product
FROM sales
WHERE revenue > 1100
GROUP BY region_id;

Output:

+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget B |
| 2 | Gadget C |
| 3 | Thingamajig |
+-----------+---------------+

And here’s a WHERE clause applied to the grouped column:

SELECT 
    region_id, 
    FIRST(product_name) AS first_product
FROM sales
WHERE region_id IN (1, 3)
GROUP BY region_id;

Output:

+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget A |
| 3 | Thingamajig |
+-----------+---------------+

If no rows match the WHERE clause, then no rows are returned:

SELECT 
    region_id, 
    FIRST(product_name) AS first_product
FROM sales
WHERE region_id IN (11, 13)
GROUP BY region_id;

Output:





That’s intentionally blank, due to the fact that no results were returned.

Synonym

As mentioned, FIRST() is a synonym for ARBITRARY(), so you can use either one to achieve the same outcome.