DuckDB has a max_by()
function that finds the row with the maximum value in one column and returns the corresponding value from another column at that row.
Below are some examples that demonstrate how it works.
Syntax
The max_by()
function has two forms:
- Two-argument form:
max_by(arg, val)
- Three-argument form:
max_by(arg, val, n)
Where:
arg
: The value to return from the row with maximumval
val
: The value to find the maximum ofn
: (Optional) Number of values to return per group
Example 1: Basic Usage
Let’s create and populate a table for our example:
-- Create a sales table
CREATE TABLE sales (
sale_date DATE,
product_name VARCHAR,
quantity INT,
revenue DECIMAL(10,2)
);
-- Insert sample data
INSERT INTO sales VALUES
('2024-01-01', 'Widget A', 10, 150.00),
('2024-01-02', 'Widget B', 5, 200.00),
('2024-01-03', 'Widget A', 15, 225.00),
('2024-01-04', 'Widget B', 8, 180.00),
('2024-01-05', 'Widget C', 12, 180.00);
-- Select all data from the table
SELECT * FROM sales;
Output:
+------------+--------------+----------+---------+
| sale_date | product_name | quantity | revenue |
+------------+--------------+----------+---------+
| 2024-01-01 | Widget A | 10 | 150.00 |
| 2024-01-02 | Widget B | 5 | 200.00 |
| 2024-01-03 | Widget A | 15 | 225.00 |
| 2024-01-04 | Widget B | 8 | 180.00 |
| 2024-01-05 | Widget C | 12 | 180.00 |
+------------+--------------+----------+---------+
Now let’s use the max_by()
function to find the product name from the row with the highest single-sale revenue:
SELECT
max_by(product_name, revenue) as highest_revenue_product
FROM sales;
Result:
+-------------------------+
| highest_revenue_product |
+-------------------------+
| Widget A |
+-------------------------+
Example 2: Using max_by()
with GROUP BY
This example uses max_by()
with the GROUP BY
clause.
First, let’s create and populate another table:
-- Create an employees table
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Beavis Blakefield', 'Sales', 75000.00, '2022-01-15'),
(2, 'Mahavishnu Mars', 'Sales', 82000.00, '2021-06-20'),
(3, 'Veralda Vett', 'Engineering', 95000.00, '2020-03-10'),
(4, 'Hank Heckler', 'Engineering', 70000.00, '2021-07-11'),
(5, 'Slash Slater', 'Engineering', 98000.00, '2019-11-25'),
(6, 'Amy Amish', 'Engineering', 87000.00, '2019-12-14'),
(7, 'Bon Butler', 'Marketing', 70000.00, '2023-02-01'),
(8, 'Dave Danker', 'Marketing', 72000.00, '2022-09-15'),
(9, 'Sally Schmit', 'Marketing', 82000.00, '2017-10-07');
-- Select all data from the table
SELECT * FROM employees;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+
Now we’ll find the name of the highest-paid employee in each department, along with their salary:
SELECT
department,
max_by(emp_name, salary) as highest_paid_employee,
max(salary) as max_salary
FROM employees
GROUP BY department
ORDER BY department;
Result:
+-------------+-----------------------+------------+
| department | highest_paid_employee | max_salary |
+-------------+-----------------------+------------+
| Engineering | Slash Slater | 98000.00 |
| Marketing | Sally Schmit | 82000.00 |
| Sales | Mahavishnu Mars | 82000.00 |
+-------------+-----------------------+------------+
Example 3: Multiple Values per Group
As mentioned, we can provide a third argument in order to return the top n rows. Here’s an example:
SELECT
department,
max_by(emp_name, salary, 2) as top_employees
FROM employees
GROUP BY department;
Result:
+-------------+--------------------------------------+
| department | top_employees |
+-------------+--------------------------------------+
| Marketing | [Sally Schmit, Dave Danker] |
| Engineering | [Slash Slater, Veralda Vett] |
| Sales | [Mahavishnu Mars, Beavis Blakefield] |
+-------------+--------------------------------------+
This query returned the names of the two highest-paid employees in each department. The results are returned as a LIST
when we do this.
Using max_by()
with Dates
We can use max_by()
with dates to get the value with the most recent date:
SELECT
department,
max_by(emp_name, hire_date) as most_recent_hire
FROM employees
GROUP BY department
ORDER BY department;
Result:
+-------------+-------------------+
| department | most_recent_hire |
+-------------+-------------------+
| Engineering | Hank Heckler |
| Marketing | Bon Butler |
| Sales | Beavis Blakefield |
+-------------+-------------------+
This query returned the employee with the most recent hire date in each department.
Synonyms/Aliases
The function also has some aliases, which results in three equivalent names:
max_by()
arg_max()
argMax()
All forms are identical in functionality and can be used interchangeably. Therefore, we can replace any of the above examples with these aliases:
SELECT
department,
max_by(emp_name, hire_date) as max_by,
arg_max(emp_name, hire_date) as arg_max,
argMax(emp_name, hire_date) as argMax
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+-------------------+-------------------+
| department | max_by | arg_max | argMax |
+-------------+-------------------+-------------------+-------------------+
| Marketing | Bon Butler | Bon Butler | Bon Butler |
| Sales | Beavis Blakefield | Beavis Blakefield | Beavis Blakefield |
| Engineering | Hank Heckler | Hank Heckler | Hank Heckler |
+-------------+-------------------+-------------------+-------------------+