One of the aggregate functions available in DuckDB is the FIRST()
function, which returns the first value from each group in a query.
Let’s take a look at some examples of using the FIRST()
function in DuckDB
What is the FIRST()
Function?
In DuckDB, FIRST()
is an aggregate function that returns the first value from a set of rows in a group. This includes null and non-null values. The result of FIRST()
is influenced by the ordering within the group.
FIRST()
can be useful when you want to fetch a representative value for each group without concern for which specific value it is or whether it’s null or not.
FIRST()
is a synonym for ARBITRARY()
, so we can use either function to achieve the same outcome.
Syntax
FIRST(expression)
Where expression
is the column or expression from which the first value is returned.
Example Scenarios
Below are some examples that demonstrate how the FIRST()
function works in DuckDB.
Sample Data
The examples use the following data:
-- Create the sales table
CREATE TABLE sales (
region_id INTEGER,
product_name TEXT,
revenue INTEGER,
discount INTEGER
);
-- Insert data into the sales table
INSERT INTO sales (region_id, product_name, revenue, discount) VALUES
(1, 'Widget A', 1000, 50),
(1, 'Widget B', 1500, NULL),
(2, 'Gadget C', 2000, 100),
(2, 'Gadget D', NULL, NULL),
(3, 'Thingamajig', 1200, 30),
(3, 'Widget E', 1300, 40),
(4, 'Gizmo F', NULL, 20),
(4, 'Gizmo G', 1100, NULL);
-- Select everything from the sales table
SELECT * FROM sales;
Output:
+-----------+--------------+---------+----------+
| region_id | product_name | revenue | discount |
+-----------+--------------+---------+----------+
| 1 | Widget A | 1000 | 50 |
| 1 | Widget B | 1500 | null |
| 2 | Gadget C | 2000 | 100 |
| 2 | Gadget D | null | null |
| 3 | Thingamajig | 1200 | 30 |
| 3 | Widget E | 1300 | 40 |
| 4 | Gizmo F | null | 20 |
| 4 | Gizmo G | 1100 | null |
+-----------+--------------+---------+----------+
This data contains some null values in order to demonstrate how FIRST()
works with null values.
Basic Example
We want to retrieve an employee name from each department:
SELECT
region_id,
FIRST(product_name) AS first_product
FROM sales
GROUP BY region_id;
Output:
+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget A |
| 2 | Gadget C |
| 3 | Thingamajig |
| 4 | Gizmo F |
+-----------+---------------+
We can see that it returned the first employee from each group, even if it’s a null value. If you only want to return non-null values, use ANY_VALUE()
instead.
With Ordering
When we add an ORDER BY
clause to the function, the FIRST()
function returns the first value according to that order:
SELECT
region_id,
FIRST(product_name ORDER BY revenue DESC) AS first_product
FROM sales
GROUP BY region_id;
Output:
+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget B |
| 2 | Gadget C |
| 3 | Widget E |
| 4 | Gizmo G |
+-----------+---------------+
Handling NULL Values
As mentioned, FIRST()
returns the first value whether it’s null or not.
Let’s run FIRST()
on the revenue
column:
SELECT
region_id,
FIRST(revenue)
FROM sales
GROUP BY region_id;
Output:
+-----------+------------------+
| region_id | "first"(revenue) |
+-----------+------------------+
| 1 | 1000 |
| 2 | 2000 |
| 3 | 1200 |
| 4 | null |
+-----------+------------------+
We can see that null was returned in the fourth column. This is in contrast to what the ANY_VALUE()
function would return, as it only returns non-null values. If you don’t want null values returned, use that function instead.
Adding a WHERE
Clause
Here’s an example of a WHERE
clause applied against a column that’s not being grouped:
SELECT
region_id,
FIRST(product_name) AS first_product
FROM sales
WHERE revenue > 1100
GROUP BY region_id;
Output:
+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget B |
| 2 | Gadget C |
| 3 | Thingamajig |
+-----------+---------------+
And here’s a WHERE
clause applied to the grouped column:
SELECT
region_id,
FIRST(product_name) AS first_product
FROM sales
WHERE region_id IN (1, 3)
GROUP BY region_id;
Output:
+-----------+---------------+
| region_id | first_product |
+-----------+---------------+
| 1 | Widget A |
| 3 | Thingamajig |
+-----------+---------------+
If no rows match the WHERE
clause, then no rows are returned:
SELECT
region_id,
FIRST(product_name) AS first_product
FROM sales
WHERE region_id IN (11, 13)
GROUP BY region_id;
Output:
That’s intentionally blank, due to the fact that no results were returned.
Synonym
As mentioned, FIRST()
is a synonym for ARBITRARY()
, so you can use either one to achieve the same outcome.