Exploring the ARG_MAX() Function in DuckDB

In DuckDB, arg_max() is an aggregate function that finds the row with the maximum value in one column and returns the corresponding value from another column at that row. Rows where either of the first two arguments is NULL are ignored.

Let’s take a look at the arg_max() with some straightforward examples.

Syntax

First up, we should understand the syntax. The arg_max() function has two forms:

  1. Two-argument form:
arg_max(arg, val)
  1. Three-argument form:
arg_max(arg, val, n)

Where:

  • arg: The value to return from the row with maximum val
  • val: The value to find the maximum of
  • n: (Optional) Number of values to return per group

Function Aliases/Synonyms

The function also has some aliases, which results in three equivalent names:

  • arg_max()
  • max_by()
  • argMax()

All forms are identical in functionality and can be used interchangeably. In other words, we can use any of these three with syntax listed above, and they will produce the same result.

Example 1: Basic Usage

Let’s see how arg_max() works with a simple example.

We’ll create a table and insert data:

-- Create a sales table
CREATE TABLE sales (
    sale_date DATE,
    product_name VARCHAR,
    quantity INT,
    revenue DECIMAL(10,2)
);

-- Insert sample data
INSERT INTO sales VALUES
    ('2024-01-01', 'Widget A', 10, 150.00),
    ('2024-01-02', 'Widget B', 5, 200.00),
    ('2024-01-03', 'Widget A', 15, 225.00),
    ('2024-01-04', 'Widget B', 8, 180.00),
    ('2024-01-05', 'Widget C', 12, 180.00);

-- Select all data from the table
SELECT * FROM sales;

Output:

+------------+--------------+----------+---------+
| sale_date | product_name | quantity | revenue |
+------------+--------------+----------+---------+
| 2024-01-01 | Widget A | 10 | 150.00 |
| 2024-01-02 | Widget B | 5 | 200.00 |
| 2024-01-03 | Widget A | 15 | 225.00 |
| 2024-01-04 | Widget B | 8 | 180.00 |
| 2024-01-05 | Widget C | 12 | 180.00 |
+------------+--------------+----------+---------+

Now we’ll find the product name from the row with the highest single-sale revenue:

SELECT 
    arg_max(product_name, revenue) as highest_revenue_product
FROM sales;

Result:

+-------------------------+
| highest_revenue_product |
+-------------------------+
| Widget A |
+-------------------------+

Compared with a Regular Aggregation

If you were expecting the function to use the aggregated prices, that’s not how it works. The above result didn’t aggregate all prices for the products. It simply found the product row with the highest price, then returned the value of the first argument.

To illustrate the difference between arg_max() and regular aggregation, let’s compare the two.

Using arg_max(): Finds product from the row with highest revenue:

SELECT arg_max(product_name, revenue) as product_with_highest_single_sale,
       max(revenue) as highest_single_sale
FROM sales;

Result:

+----------------------------------+---------------------+
| product_with_highest_single_sale | highest_single_sale |
+----------------------------------+---------------------+
| Widget A | 225.00 |
+----------------------------------+---------------------+

As alluded to, Widget A is returned because it has the highest single sale. This is the row with the maximum value in the revenue column.

Using GROUP BY: Finds total revenue per product:

SELECT product_name,
       sum(revenue) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue DESC;

Result:

+--------------+---------------+
| product_name | total_revenue |
+--------------+---------------+
| Widget B | 380.00 |
| Widget A | 375.00 |
| Widget C | 180.00 |
+--------------+---------------+

We can see that Widget B has the maximum total revenue when we add up all revenue for each product. So in this case, we might expect Widget B to be returned if we were looking for the product with the maximum total revenue.

Another aggregation similar to the above arg_max() example might be done with the max() function. We could use this function to return a list of products, along with their maximum single sale:

SELECT 
    product_name,
    max(revenue) as highest_single_sale
FROM sales GROUP BY product_name
ORDER BY highest_single_sale DESC;

Result:

+--------------+---------------------+
| product_name | highest_single_sale |
+--------------+---------------------+
| Widget A | 225.00 |
| Widget B | 200.00 |
| Widget C | 180.00 |
+--------------+---------------------+

And we could limit this to the top sale to achieve a similar outcome to the one we got with arg_max():

SELECT 
product_name,
max(revenue) as highest_single_sale
FROM sales GROUP BY product_name
ORDER BY highest_single_sale DESC
LIMIT 1;

Result:

+--------------+---------------------+
| product_name | highest_single_sale |
+--------------+---------------------+
| Widget A | 225.00 |
+--------------+---------------------+

Example 2: Using arg_max() with GROUP BY

Let’s do an arg_max() example that uses the GROUP BY clause. This time we’ll use employee data:

-- Create an employees table
CREATE TABLE employees (
    emp_id INT,
    emp_name VARCHAR,
    department VARCHAR,
    salary DECIMAL(10,2),
    hire_date DATE
);

-- Insert sample data
INSERT INTO employees VALUES
    (1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
    (2, 'Mahavishnu Mars', 'Sales', 82000.00, '2021-06-20'),
    (3, 'Veralda Vett', 'Engineering', 95000.00, '2020-03-10'),
    (4, 'Hank Heckler', 'Engineering', 70000.00, '2021-07-11'),
    (5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
    (6, 'Amy Amish', 'Engineering', 87000.00, '2019-12-14'),
    (7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
    (8, 'Dave Danker', 'Marketing', 72000.00, '2022-09-15'),
    (9, 'Sally Schmit', 'Marketing', 82000.00, '2017-10-07');

-- Select all data from the table
SELECT * FROM employees;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+

Now let’s find the name of the highest-paid employee in each department, along with their salary:

SELECT 
    department,
    arg_max(emp_name, salary) as highest_paid_employee,
    max(salary) as max_salary
FROM employees
GROUP BY department;

Result:

+-------------+-----------------------+------------+
| department | highest_paid_employee | max_salary |
+-------------+-----------------------+------------+
| Engineering | Slash Slater | 98000.00 |
| Sales | Mahavishnu Mars | 82000.00 |
| Marketing | Sally Schmit | 82000.00 |
+-------------+-----------------------+------------+

Example 3: Multiple Values per Group

The arg_max() syntax allows us to provide an optional third argument. When we provide this argument, it tells DuckDB to return that many rows. It returns the top n rows ordered by the second argument in descending order (with n being the third argument).

With that in mind, let’s now find the names of the two highest-paid employees in each department:

SELECT 
    department,
    arg_max(emp_name, salary, 2) as top_employees
FROM employees
GROUP BY department;

Result:

+-------------+--------------------------------------+
| department | top_employees |
+-------------+--------------------------------------+
| Engineering | [Slash Slater, Veralda Vett] |
| Marketing | [Sally Schmit, Dave Danker] |
| Sales | [Mahavishnu Mars, Beavis Blakefield] |
+-------------+--------------------------------------+

We can see that the results are returned as a LIST, and only the top two are returned for each department. It’s the top two because I specified 2 as the third argument. We could use 3 to get the top three, 4 for the top four, and so on.

Using arg_max() with Dates

We can use arg_max() with dates to get the value with the most recent date.

For example, we can use it to return the employee with the most recent hire date in each department:

SELECT 
    department,
    arg_max(emp_name, hire_date) as most_recent_hire
FROM employees
GROUP BY department;

Result:

+-------------+-------------------+
| department | most_recent_hire |
+-------------+-------------------+
| Engineering | Hank Heckler |
| Sales | Beavis Blakefield |
| Marketing | Bon Butler |
+-------------+-------------------+

Synonyms/Aliases

As mentioned, arg_max() has two synonyms: argMax() and max_by(). Therefore, we can replace any of the above examples with these aliases:

SELECT 
    department,
    arg_max(emp_name, hire_date) as arg_max,
    argMax(emp_name, hire_date) as argMax,
    max_by(emp_name, hire_date) as max_by
FROM employees
GROUP BY department;

Result:

+-------------+-------------------+-------------------+-------------------+
| department | arg_max | argMax | max_by |
+-------------+-------------------+-------------------+-------------------+
| Engineering | Hank Heckler | Hank Heckler | Hank Heckler |
| Marketing | Bon Butler | Bon Butler | Bon Butler |
| Sales | Beavis Blakefield | Beavis Blakefield | Beavis Blakefield |
+-------------+-------------------+-------------------+-------------------+