An Overview of ARG_MIN() in DuckDB

In DuckDB, arg_min() is an aggregate function that finds the row with the minimum value in one column and returns the corresponding value from another column at that row. Rows where either of the first two expressions is NULL are ignored.

In this article we explore DuckDB’s arg_min() function with some simple examples.

Syntax

Let’s start by looking at the function’s syntax. The arg_min() function has two forms:

  1. Two-argument form:
arg_min(arg, val)
  1. Three-argument form:
arg_min(arg, val, n)

Where:

  • arg: The value to return from the row with minimum val
  • val: The value to find the minimum of
  • n: (Optional) Number of values to return per group

Function Aliases/Synonyms

The function also has some aliases, which results in three equivalent names:

  • arg_min()
  • min_by()
  • argMin()

All forms are identical in functionality and can be used interchangeably. In other words, we can use any of these three with syntax listed above, and they will produce the same result.

Example 1: Basic Usage

We’ll use a simple example to start off.

We’ll use the following data for the example:

-- Create a sales table
CREATE TABLE sales (
    sale_date DATE,
    product_name VARCHAR,
    quantity INT,
    revenue DECIMAL(10,2)
);

-- Insert sample data
INSERT INTO sales VALUES
    ('2024-01-01', 'Widget A', 10, 150.00),
    ('2024-01-02', 'Widget B', 5, 200.00),
    ('2024-01-03', 'Widget A', 15, 225.00),
    ('2024-01-04', 'Widget B', 8, 180.00),
    ('2024-01-05', 'Widget C', 12, 180.00);

-- Select all data from the table
SELECT * FROM sales;

Output:

+------------+--------------+----------+---------+
| sale_date | product_name | quantity | revenue |
+------------+--------------+----------+---------+
| 2024-01-01 | Widget A | 10 | 150.00 |
| 2024-01-02 | Widget B | 5 | 200.00 |
| 2024-01-03 | Widget A | 15 | 225.00 |
| 2024-01-04 | Widget B | 8 | 180.00 |
| 2024-01-05 | Widget C | 12 | 180.00 |
+------------+--------------+----------+---------+

Now we’ll use arg_min() to find the product name from the row with the lowest single-sale revenue:

SELECT 
    arg_min(product_name, revenue) as lowest_revenue_product
FROM sales;

Result:

+------------------------+
| lowest_revenue_product |
+------------------------+
| Widget A |
+------------------------+

Compared with a Regular Aggregation

We can see that the above result didn’t aggregate all prices for the products. It simply found the product row with the lowest price, then returned the value of the first argument.

To illustrate the difference between arg_min() and regular aggregation, let’s compare the two.

Using arg_min(): Finds product from the row with lowest revenue:

SELECT arg_min(product_name, revenue) as product_with_lowest_single_sale,
       min(revenue) as lowest_single_sale
FROM sales;

Result:

+---------------------------------+--------------------+
| product_with_lowest_single_sale | lowest_single_sale |
+---------------------------------+--------------------+
| Widget A | 150.00 |
+---------------------------------+--------------------+

As mentioned, Widget A is returned because it has the lowest single sale. This is the row with the minimum value in the revenue column.

Using GROUP BY: Finds total revenue per product:

SELECT product_name,
       sum(revenue) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue ASC;

Result:

+--------------+---------------+
| product_name | total_revenue |
+--------------+---------------+
| Widget C | 180.00 |
| Widget A | 375.00 |
| Widget B | 380.00 |
+--------------+---------------+

We can see that Widget C has the minimum total revenue when we add up all revenue for each product. So in this case, we might expect Widget C to be returned if we were looking for the product with the lowest total revenue.

Another aggregation similar to the above arg_min() example could be done with the min() function. We could use this function to return a list of products, along with their minimum single sale:

SELECT 
    product_name,
    min(revenue) as lowest_single_sale
FROM sales GROUP BY product_name
ORDER BY lowest_single_sale ASC;

Result:

+--------------+--------------------+
| product_name | lowest_single_sale |
+--------------+--------------------+
| Widget A | 150.00 |
| Widget B | 180.00 |
| Widget C | 180.00 |
+--------------+--------------------+

And we could limit this to the lowest sale to achieve a similar outcome to the one we got with arg_min():

SELECT 
    product_name,
    min(revenue) as lowest_single_sale
FROM sales GROUP BY product_name
ORDER BY lowest_single_sale ASC
LIMIT 1;

Result:

+--------------+--------------------+
| product_name | lowest_single_sale |
+--------------+--------------------+
| Widget A | 150.00 |
+--------------+--------------------+

Example 2: Using arg_min() with GROUP BY

Let’s do an arg_min() example that uses the GROUP BY clause. This time we’ll use employee data:

-- Create an employees table
CREATE TABLE employees (
    emp_id INT,
    emp_name VARCHAR,
    department VARCHAR,
    salary DECIMAL(10,2),
    hire_date DATE
);

-- Insert sample data
INSERT INTO employees VALUES
    (1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
    (2, 'Mahavishnu Mars', 'Sales', 82000.00, '2021-06-20'),
    (3, 'Veralda Vett', 'Engineering', 95000.00, '2020-03-10'),
    (4, 'Hank Heckler', 'Engineering', 70000.00, '2021-07-11'),
    (5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
    (6, 'Amy Amish', 'Engineering', 87000.00, '2019-12-14'),
    (7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
    (8, 'Dave Danker', 'Marketing', 72000.00, '2022-09-15'),
    (9, 'Sally Schmit', 'Marketing', 82000.00, '2017-10-07');

-- Select all data from the table
SELECT * FROM employees;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+

Now let’s find the name of the lowest-paid employee in each department, along with their respective salaries:

SELECT 
    department,
    arg_min(emp_name, salary) as lowest_paid_employee,
    min(salary) as min_salary
FROM employees
GROUP BY department;

Result:

+-------------+----------------------+------------+
| department | lowest_paid_employee | min_salary |
+-------------+----------------------+------------+
| Engineering | Hank Heckler | 70000.00 |
| Marketing | Bon Butler | 70000.00 |
| Sales | Beavis Blakefield | 75000.00 |
+-------------+----------------------+------------+

Example 3: Multiple Values per Group

The arg_min() syntax allows us to provide an optional third argument. When we provide this argument, it tells DuckDB to return that many rows. It returns the top n rows ordered by the second argument in descending order (with n being the third argument).

So let’s now find the names of the two lowest-paid employees in each department:

SELECT 
    department,
    arg_min(emp_name, salary, 2) as lowest_paid_employees
FROM employees
GROUP BY department;

Result:

+-------------+--------------------------------------+
| department | lowest_paid_employees |
+-------------+--------------------------------------+
| Engineering | [Hank Heckler, Amy Amish] |
| Sales | [Beavis Blakefield, Mahavishnu Mars] |
| Marketing | [Bon Butler, Dave Danker] |
+-------------+--------------------------------------+

We can see that the results are returned as a LIST, and only the bottom two are returned for each department. It’s the bottom two because I specified 2 as the third argument. We could use 3 to get the bottom three, 4 for the bottom four, and so on.

Using arg_min() with Dates

We can use arg_min() with dates to get the value with the oldest date.

For example, we can use it to return the employee with the first hire date in each department:

SELECT 
    department,
    arg_min(emp_name, hire_date) as first_hire
FROM employees
GROUP BY department;

Result:

+-------------+-----------------+
| department | first_hire |
+-------------+-----------------+
| Engineering | Slash Slater |
| Marketing | Sally Schmit |
| Sales | Mahavishnu Mars |
+-------------+-----------------+

Synonyms/Aliases

As mentioned, arg_min() has two synonyms: argMin() and min_by(). Therefore, we can replace any of the above examples with these aliases:

SELECT 
    department,
    arg_min(emp_name, hire_date) as arg_min,
    argMin(emp_name, hire_date) as argMin,
    min_by(emp_name, hire_date) as min_by
FROM employees
GROUP BY department;

Result:

+-------------+-----------------+-----------------+-----------------+
| department | arg_min | argMin | min_by |
+-------------+-----------------+-----------------+-----------------+
| Engineering | Slash Slater | Slash Slater | Slash Slater |
| Marketing | Sally Schmit | Sally Schmit | Sally Schmit |
| Sales | Mahavishnu Mars | Mahavishnu Mars | Mahavishnu Mars |
+-------------+-----------------+-----------------+-----------------+