DuckDB has a arg_min_null()
function that works in a similar way to the arg_min()
function. That is, it finds the row with the minimum value in one column and returns the corresponding value from another column at that row.
But there’s also a difference between these two functions. The main difference is in the way they deal with NULL values. Also, arg_min_null()
only accepts two arguments, whereas arg_min()
accepts an optional third argument. Additionally, there aren’t any aliases for arg_min_null()
at the time of writing (arg_min()
has a couple of aliases).
In this article we’ll look at how arg_min_null()
works, and we’ll compare it with arg_min()
to see how each function handles NULL values.
Syntax
The syntax for arg_min_null()
goes like this:
arg_min_null(arg, val)
Where:
arg
: The value to return from the row with minimumval
val
: The value to find the minimum of
arg_min_null()
finds the row with the minimum val
and calculates the arg
expression at that row. Rows where the val
expression evaluates to NULL
are ignored. By comparison, the the arg_min()
function ignores rows that are NULL in either arg
or val
.
Example 1: Basic Usage
Let’s run a simple example to see how arg_min_null()
works, and how it differs to arg_min()
.
We’ll create a table and insert data:
-- Create table
CREATE TABLE Person (
Personid int,
FirstName varchar,
LastName varchar,
Age int
);
-- Insert data
INSERT INTO Person VALUES
(1, 'Beavis', 'Blakefield', NULL),
(2, NULL, 'Reacher', 35),
(3, 'Veralda', NULL, 45);
-- Select all data
SELECT * FROM Person;
Output:
+----------+-----------+------------+------+
| Personid | FirstName | LastName | Age |
+----------+-----------+------------+------+
| 1 | Beavis | Blakefield | null |
| 2 | null | Reacher | 35 |
| 3 | Veralda | null | 45 |
+----------+-----------+------------+------+
That data contains NULL values across its various columns, which will help us to see the difference between the two functions.
Now let’s run a query against that data:
SELECT
arg_min(FirstName, Age) AS arg_min,
arg_min_null(FirstName, Age) AS arg_min_null
FROM Person;
Result:
+---------+--------------+
| arg_min | arg_min_null |
+---------+--------------+
| Veralda | null |
+---------+--------------+
So in this example, both functions return the first name of the youngest person. However, some rows contain NULL in the FirstName
column, which causes the functions to return different results.
Specifically, the arg_min()
function ignores rows that contain NULL in that column whereas arg_min_null()
doesn’t ignore them.
So we can use this logic to deduce that arg_min_null()
returned the first name of the 35 year old person. It just so happened that the first name is NULL
. The arg_min()
function on the other hand ignored that row and returned the youngest person that has a non-NULL
first name (which is Veralda at 45 years old).
Let’s remove NULL values from the query to see how that changes things:
SELECT
arg_min(FirstName, Age) AS arg_min,
arg_min_null(FirstName, Age) AS arg_min_null
FROM Person
WHERE FirstName IS NOT NULL;
Result:
+---------+--------------+
| arg_min | arg_min_null |
+---------+--------------+
| Veralda | Veralda |
+---------+--------------+
This time both functions returned the same result. We eliminated all rows that contain NULL
in the FirstName
column and so both functions searched only those rows that aren’t NULL
in that column and the Age
column (both functions ignore NULL values in the second argument).
Let’s get the last name instead of the first name:
SELECT
arg_min(LastName, Age) AS arg_min,
arg_min_null(LastName, Age) AS arg_min_null
FROM Person;
Result:
+---------+--------------+
| arg_min | arg_min_null |
+---------+--------------+
| Reacher | Reacher |
+---------+--------------+
This time both functions returned the same value. That’s because the youngest person has a last name, and so arg_min_null()
didn’t skip that row. There is a person with a NULL last name, but that person isn’t the youngest, and so it had no impact on the result.
Example 2: Using arg_min_null()
with GROUP BY
Let’s do an arg_min_null()
example that uses the GROUP BY
clause. This time we’ll use employee data:
-- Create an employees table
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
(2, 'Mahavishnu Mars', 'Sales', 82000.00, NULL),
(3, 'Veralda Vett', 'Engineering', NULL, '2020-03-10'),
(4, NULL, 'Engineering', 70000.00, '2021-07-11'),
(5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
(6, 'Hank Heckler', 'Engineering', 87000.00, '2019-12-14'),
(7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
(8, NULL, 'Marketing', 72000.00, '2022-09-15'),
(9, 'Dave Danker', 'Marketing', 91000.00, '2017-10-07');
-- Select all data from the table
SELECT * FROM employees;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | null |
| 3 | Veralda Vett | Engineering | null | 2020-03-10 |
| 4 | null | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Hank Heckler | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | null | Marketing | 72000.00 | 2022-09-15 |
| 9 | Dave Danker | Marketing | 91000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+
Now let’s find the name of the lowest-paid employee in each department, along with their salary:
SELECT
department,
arg_min(emp_name, salary) as arg_min,
arg_min_null(emp_name, salary) as arg_min_null,
min(salary) as min_salary
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+------------+
| department | arg_min | arg_min_null | min_salary |
+-------------+-------------------+-------------------+------------+
| Engineering | Hank Heckler | null | 70000.00 |
| Marketing | Bon Butler | Bon Butler | 70000.00 |
| Sales | Beavis Blakefield | Beavis Blakefield | 75000.00 |
+-------------+-------------------+-------------------+------------+
Here, arg_min()
excluded the row with the NULL value, whereas arg_min_null()
included it.
I also included a column that uses the min()
function so that we could see the lowest salary for each department, regardless of whether the employee had a NULL name or not.
And if we exclude NULL values from the query:
SELECT
department,
arg_min(emp_name, salary) as arg_min,
arg_min_null(emp_name, salary) as arg_min_null,
min(salary) as min_salary
FROM employees
WHERE emp_name IS NOT NULL
GROUP BY department;
Result:
+-------------+-------------------+-------------------+------------+
| department | arg_min | arg_min_null | min_salary |
+-------------+-------------------+-------------------+------------+
| Engineering | Hank Heckler | Hank Heckler | 87000.00 |
| Marketing | Bon Butler | Bon Butler | 70000.00 |
| Sales | Beavis Blakefield | Beavis Blakefield | 75000.00 |
+-------------+-------------------+-------------------+------------+
We get the same result for both functions (and the min()
function also reflects that employee’s salary).
Using arg_min_null()
with Dates
We can use arg_min_null()
with dates to get the value with the earliest date.
For example, we can use it to return the employee with the earliest hire date in each department:
SELECT
department,
arg_min(emp_name, hire_date) as arg_min,
arg_min_null(emp_name, hire_date) as arg_min_null
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+
| department | arg_min | arg_min_null |
+-------------+-------------------+-------------------+
| Engineering | Slash Slater | Slash Slater |
| Marketing | Dave Danker | Dave Danker |
| Sales | Beavis Blakefield | Beavis Blakefield |
+-------------+-------------------+-------------------+
In this example, both functions returned the same result. That’s because none of the earliest hires had NULL values as their first name.
However, using arg_max()
and arg_max_null()
(to get the latest hires) would show different results for the Engineering department:
SELECT
department,
arg_max(emp_name, hire_date) as arg_max,
arg_max_null(emp_name, hire_date) as arg_max_null
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+
| department | arg_max | arg_max_null |
+-------------+-------------------+-------------------+
| Engineering | Veralda Vett | null |
| Sales | Beavis Blakefield | Beavis Blakefield |
| Marketing | Bon Butler | Bon Butler |
+-------------+-------------------+-------------------+