In DuckDB, the arg_max_null()
function works in a similar way to the arg_max()
function, in that it finds the row with the maximum value in one column and returns the corresponding value from another column at that row.
But where it differs from arg_max()
is in the way it deals with NULL values. Also, arg_max_null()
only accepts two arguments, whereas arg_max()
accepts an optional third argument. Additionally, there aren’t any aliases for arg_max_null()
at the time of writing (arg_max()
has a couple of aliases).
In this article we’ll look at how arg_max_null()
works, and we’ll compare it with arg_max()
to see how each function handles NULL values.
Syntax
The syntax for arg_max_null()
goes like this:
arg_max_null(arg, val)
Where:
arg
: The value to return from the row with maximumval
val
: The value to find the maximum of
arg_max_null()
finds the row with the maximum val
and calculates the arg
expression at that row. Rows where the val
expression evaluates to NULL
are ignored. By comparison, the the arg_max()
function ignores rows that are NULL in either arg
or val
.
Example 1: Basic Usage
Let’s run a simple example to see how arg_max_null()
works, and how it differs to arg_max()
.
We’ll create a table and insert data:
-- Create table
CREATE TABLE Person (
Personid int,
FirstName varchar,
LastName varchar,
Age int
);
-- Insert data
INSERT INTO Person VALUES
(1, NULL, 'Blakefield', 95),
(2, 'Mahavishnu', NULL, 85),
(3, 'Veralda', 'Vett', NULL);
-- Select all data
SELECT * FROM Person;
Output:
+----------+------------+------------+------+
| Personid | FirstName | LastName | Age |
+----------+------------+------------+------+
| 1 | null | Blakefield | 95 |
| 2 | Mahavishnu | null | 85 |
| 3 | Veralda | Vett | null |
+----------+------------+------------+------+
That data contains NULL values across its various columns, which will help us to see the difference between the two functions.
Now let’s run a query against that data:
SELECT
arg_max(FirstName, Age) AS arg_max,
arg_max_null(FirstName, Age) AS arg_max_null
FROM Person;
Result:
+------------+--------------+
| arg_max | arg_max_null |
+------------+--------------+
| Mahavishnu | null |
+------------+--------------+
So in this example, both functions return the first name of the oldest person. However, some rows contain NULL in the FirstName
column, which causes the functions to return different results.
Specifically, the arg_max()
function ignores rows that contain NULL in that column whereas arg_max_null()
doesn’t ignore them.
So we can use this logic to deduce that arg_max_null()
returned the first name of the 95 year old person. It just so happened that the first name is NULL
. The arg_max()
function on the other hand ignored that row and returned the oldest person that has a non-NULL
first name (which is Mahavishnu at 85 years old).
Let’s remove NULL values from the query to see how that changes things:
SELECT
arg_max(FirstName, Age) AS arg_max,
arg_max_null(FirstName, Age) AS arg_max_null
FROM Person
WHERE FirstName IS NOT NULL;
Result:
+------------+--------------+
| arg_max | arg_max_null |
+------------+--------------+
| Mahavishnu | Mahavishnu |
+------------+--------------+
This time both functions returned the same result. We eliminated all rows that contain NULL
in the FirstName
column and so both functions searched only those rows that aren’t NULL
in that column and the Age
column (both functions ignore NULL values in the second argument).
Let’s get the last name instead of the first name:
SELECT
arg_max(LastName, Age) AS arg_max,
arg_max_null(LastName, Age) AS arg_max_null
FROM Person;
Result:
+------------+--------------+
| arg_max | arg_max_null |
+------------+--------------+
| Blakefield | Blakefield |
+------------+--------------+
This time both functions returned the same value. That’s because the oldest person has a last name, and so arg_max_null()
didn’t skip that row. There is a person with a NULL last name, but that person isn’t the oldest, and so it had no impact on the result.
Example 2: Using arg_max_null()
with GROUP BY
Let’s do an arg_max_null()
example that uses the GROUP BY
clause. This time we’ll use employee data:
-- Create an employees table
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
(2, 'Mahavishnu Mars', 'Sales', 82000.00, NULL),
(3, 'Veralda Vett', 'Engineering', NULL, '2020-03-10'),
(4, 'Hank Heckler', 'Engineering', 70000.00, '2021-07-11'),
(5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
(6, NULL, 'Engineering', 87000.00, '2019-12-14'),
(7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
(8, 'Dave Danker', 'Marketing', 72000.00, '2022-09-15'),
(9, NULL, 'Marketing', 91000.00, '2017-10-07');
-- Select all data from the table
SELECT * FROM employees;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | null |
| 3 | Veralda Vett | Engineering | null | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | null | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | null | Marketing | 91000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+
Now let’s find the name of the highest-paid employee in each department, along with their salary:
SELECT
department,
arg_max(emp_name, salary) as arg_max,
arg_max_null(emp_name, salary) as arg_max_null,
max(salary) as max_salary
FROM employees
GROUP BY department;
Result:
+-------------+-----------------+-----------------+------------+
| department | arg_max | arg_max_null | max_salary |
+-------------+-----------------+-----------------+------------+
| Engineering | Slash Slater | Slash Slater | 98000.00 |
| Sales | Mahavishnu Mars | Mahavishnu Mars | 82000.00 |
| Marketing | Dave Danker | null | 91000.00 |
+-------------+-----------------+-----------------+------------+
Here, arg_max()
excluded the row with the NULL value, whereas arg_max_null()
included it.
I also included a column that uses the max()
function so that we could see the highest salary for each department, regardless of whether the employee had a NULL name or not.
And if we exclude NULL values from the query:
SELECT
department,
arg_max(emp_name, salary) as arg_max,
arg_max_null(emp_name, salary) as arg_max_null,
max(salary) as max_salary
FROM employees
WHERE emp_name IS NOT NULL
GROUP BY department;
Result:
+-------------+-----------------+-----------------+------------+
| department | arg_max | arg_max_null | max_salary |
+-------------+-----------------+-----------------+------------+
| Marketing | Dave Danker | Dave Danker | 72000.00 |
| Engineering | Slash Slater | Slash Slater | 98000.00 |
| Sales | Mahavishnu Mars | Mahavishnu Mars | 82000.00 |
+-------------+-----------------+-----------------+------------+
We get the same result for both functions (and the max()
function also reflects that employee’s salary).
Using arg_max_null()
with Dates
We can use arg_max_null()
with dates to get the value with the most recent date.
For example, we can use it to return the employee with the most recent hire date in each department:
SELECT
department,
arg_max(emp_name, hire_date) as arg_max,
arg_max_null(emp_name, hire_date) as arg_max_null
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+
| department | arg_max | arg_max_null |
+-------------+-------------------+-------------------+
| Engineering | Hank Heckler | Hank Heckler |
| Sales | Beavis Blakefield | Beavis Blakefield |
| Marketing | Bon Butler | Bon Butler |
+-------------+-------------------+-------------------+
In this example, both functions returned the same result. That’s because none of the most recent hires had NULL values as their first name.
It would be a different story if we were to use arg_min_null()
though (to get the earliest hires):
SELECT
department,
arg_min(emp_name, hire_date) as arg_min,
arg_min_null(emp_name, hire_date) as arg_min_null
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+
| department | arg_min | arg_min_null |
+-------------+-------------------+-------------------+
| Engineering | Slash Slater | Slash Slater |
| Sales | Beavis Blakefield | Beavis Blakefield |
| Marketing | Dave Danker | null |
+-------------+-------------------+-------------------+