In DuckDB, arg_max()
is an aggregate function that finds the row with the maximum value in one column and returns the corresponding value from another column at that row. Rows where either of the first two arguments is NULL
are ignored.
Let’s take a look at the arg_max()
with some straightforward examples.
Syntax
First up, we should understand the syntax. The arg_max()
function has two forms:
- Two-argument form:
arg_max(arg, val)
- Three-argument form:
arg_max(arg, val, n)
Where:
arg
: The value to return from the row with maximumval
val
: The value to find the maximum ofn
: (Optional) Number of values to return per group
Function Aliases/Synonyms
The function also has some aliases, which results in three equivalent names:
arg_max()
max_by()
argMax()
All forms are identical in functionality and can be used interchangeably. In other words, we can use any of these three with syntax listed above, and they will produce the same result.
Example 1: Basic Usage
Let’s see how arg_max()
works with a simple example.
We’ll create a table and insert data:
-- Create a sales table
CREATE TABLE sales (
sale_date DATE,
product_name VARCHAR,
quantity INT,
revenue DECIMAL(10,2)
);
-- Insert sample data
INSERT INTO sales VALUES
('2024-01-01', 'Widget A', 10, 150.00),
('2024-01-02', 'Widget B', 5, 200.00),
('2024-01-03', 'Widget A', 15, 225.00),
('2024-01-04', 'Widget B', 8, 180.00),
('2024-01-05', 'Widget C', 12, 180.00);
-- Select all data from the table
SELECT * FROM sales;
Output:
+------------+--------------+----------+---------+
| sale_date | product_name | quantity | revenue |
+------------+--------------+----------+---------+
| 2024-01-01 | Widget A | 10 | 150.00 |
| 2024-01-02 | Widget B | 5 | 200.00 |
| 2024-01-03 | Widget A | 15 | 225.00 |
| 2024-01-04 | Widget B | 8 | 180.00 |
| 2024-01-05 | Widget C | 12 | 180.00 |
+------------+--------------+----------+---------+
Now we’ll find the product name from the row with the highest single-sale revenue:
SELECT
arg_max(product_name, revenue) as highest_revenue_product
FROM sales;
Result:
+-------------------------+
| highest_revenue_product |
+-------------------------+
| Widget A |
+-------------------------+
Compared with a Regular Aggregation
If you were expecting the function to use the aggregated prices, that’s not how it works. The above result didn’t aggregate all prices for the products. It simply found the product row with the highest price, then returned the value of the first argument.
To illustrate the difference between arg_max()
and regular aggregation, let’s compare the two.
Using arg_max()
: Finds product from the row with highest revenue:
SELECT arg_max(product_name, revenue) as product_with_highest_single_sale,
max(revenue) as highest_single_sale
FROM sales;
Result:
+----------------------------------+---------------------+
| product_with_highest_single_sale | highest_single_sale |
+----------------------------------+---------------------+
| Widget A | 225.00 |
+----------------------------------+---------------------+
As alluded to, Widget A is returned because it has the highest single sale. This is the row with the maximum value in the revenue
column.
Using GROUP BY
: Finds total revenue per product:
SELECT product_name,
sum(revenue) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue DESC;
Result:
+--------------+---------------+
| product_name | total_revenue |
+--------------+---------------+
| Widget B | 380.00 |
| Widget A | 375.00 |
| Widget C | 180.00 |
+--------------+---------------+
We can see that Widget B has the maximum total revenue when we add up all revenue for each product. So in this case, we might expect Widget B to be returned if we were looking for the product with the maximum total revenue.
Another aggregation similar to the above arg_max()
example might be done with the max()
function. We could use this function to return a list of products, along with their maximum single sale:
SELECT
product_name,
max(revenue) as highest_single_sale
FROM sales GROUP BY product_name
ORDER BY highest_single_sale DESC;
Result:
+--------------+---------------------+
| product_name | highest_single_sale |
+--------------+---------------------+
| Widget A | 225.00 |
| Widget B | 200.00 |
| Widget C | 180.00 |
+--------------+---------------------+
And we could limit this to the top sale to achieve a similar outcome to the one we got with arg_max()
:
SELECT
product_name,
max(revenue) as highest_single_sale
FROM sales GROUP BY product_name
ORDER BY highest_single_sale DESC
LIMIT 1;
Result:
+--------------+---------------------+
| product_name | highest_single_sale |
+--------------+---------------------+
| Widget A | 225.00 |
+--------------+---------------------+
Example 2: Using arg_max()
with GROUP BY
Let’s do an arg_max()
example that uses the GROUP BY
clause. This time we’ll use employee data:
-- Create an employees table
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
(2, 'Mahavishnu Mars', 'Sales', 82000.00, '2021-06-20'),
(3, 'Veralda Vett', 'Engineering', 95000.00, '2020-03-10'),
(4, 'Hank Heckler', 'Engineering', 70000.00, '2021-07-11'),
(5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
(6, 'Amy Amish', 'Engineering', 87000.00, '2019-12-14'),
(7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
(8, 'Dave Danker', 'Marketing', 72000.00, '2022-09-15'),
(9, 'Sally Schmit', 'Marketing', 82000.00, '2017-10-07');
-- Select all data from the table
SELECT * FROM employees;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+
Now let’s find the name of the highest-paid employee in each department, along with their salary:
SELECT
department,
arg_max(emp_name, salary) as highest_paid_employee,
max(salary) as max_salary
FROM employees
GROUP BY department;
Result:
+-------------+-----------------------+------------+
| department | highest_paid_employee | max_salary |
+-------------+-----------------------+------------+
| Engineering | Slash Slater | 98000.00 |
| Sales | Mahavishnu Mars | 82000.00 |
| Marketing | Sally Schmit | 82000.00 |
+-------------+-----------------------+------------+
Example 3: Multiple Values per Group
The arg_max()
syntax allows us to provide an optional third argument. When we provide this argument, it tells DuckDB to return that many rows. It returns the top n rows ordered by the second argument in descending order (with n being the third argument).
With that in mind, let’s now find the names of the two highest-paid employees in each department:
SELECT
department,
arg_max(emp_name, salary, 2) as top_employees
FROM employees
GROUP BY department;
Result:
+-------------+--------------------------------------+
| department | top_employees |
+-------------+--------------------------------------+
| Engineering | [Slash Slater, Veralda Vett] |
| Marketing | [Sally Schmit, Dave Danker] |
| Sales | [Mahavishnu Mars, Beavis Blakefield] |
+-------------+--------------------------------------+
We can see that the results are returned as a LIST
, and only the top two are returned for each department. It’s the top two because I specified 2 as the third argument. We could use 3 to get the top three, 4 for the top four, and so on.
Using arg_max()
with Dates
We can use arg_max()
with dates to get the value with the most recent date.
For example, we can use it to return the employee with the most recent hire date in each department:
SELECT
department,
arg_max(emp_name, hire_date) as most_recent_hire
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+
| department | most_recent_hire |
+-------------+-------------------+
| Engineering | Hank Heckler |
| Sales | Beavis Blakefield |
| Marketing | Bon Butler |
+-------------+-------------------+
Synonyms/Aliases
As mentioned, arg_max()
has two synonyms: argMax()
and max_by()
. Therefore, we can replace any of the above examples with these aliases:
SELECT
department,
arg_max(emp_name, hire_date) as arg_max,
argMax(emp_name, hire_date) as argMax,
max_by(emp_name, hire_date) as max_by
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+-------------------+
| department | arg_max | argMax | max_by |
+-------------+-------------------+-------------------+-------------------+
| Engineering | Hank Heckler | Hank Heckler | Hank Heckler |
| Marketing | Bon Butler | Bon Butler | Bon Butler |
| Sales | Beavis Blakefield | Beavis Blakefield | Beavis Blakefield |
+-------------+-------------------+-------------------+-------------------+