3 Easy Ways to Calculate the Median Value from a List in DuckDB

While we might typically use DuckDB’s median() function to get the median value from a column, when it comes to getting the median value from a list, we need to modify this slightly. A list can contain many values, and so we need a way to calculate the median of all of those values, rather than a single value like we’d normally be doing when calculating the median value from a column.

Fortunately, DuckDB provides us with several easy options for achieving this. Below are three options for getting the median value from a list in DuckDB.

1. The list_median() Function

While we can certainly use the median() function to get the median value from a list, it does require us to do a bit of extra work (more on this later). To save us from the extra work, DuckDB has provided us with a list_median() function that we can use instead. The list_median() function was built specifically for calculating the median value from a list.

Here’s an example:

SELECT list_median([ 10, 7, 25, 3 ]);

Result:

8.5

The function returned the median value from the list as expected.

So it’s basically the same as using median() except that we prefix that with list_. This is the case for many aggregate functions in DuckDB. For example there are functions like list_avg(), list_min(), list_max(), list_sum(), etc that are basically “list” versions of their non-list_ counterparts.

While we might often use list_median() with lists that contain numeric data, the list can contain other data types, such as strings, dates, and even other lists:

SELECT 
    list_median([ 'Duck', 'Zebra', 'Elephant' ]) AS strings,
    list_median([ date '2025-01-01', date '2000-01-01', date '1999-01-01' ]) AS dates,
    list_median([ [1,2,3], [4,5,6], [1,2,3,4] ]) AS nested_lists;

Result:

+----------+---------------------+--------------+
| strings | dates | nested_lists |
+----------+---------------------+--------------+
| Elephant | 2000-01-01 00:00:00 | [1, 2, 3, 4] |
+----------+---------------------+--------------+

2. The list_aggregate() Function

DuckDB’s list_aggregate() function enables us to run an aggregate function without calling that aggregate function directly. We pass the name of the aggregate function to list_aggregate(), which then behaves as though it’s that aggregate function.

So in order to get the median value from a list, we pass the list as the first argument, and 'median' as the second argument:

SELECT list_aggregate([ 10, 7, 25, 3 ], 'median');

Result:

8.5

The list_aggregate() function has the following aliases:

  • list_aggr()
  • aggregate()
  • array_aggregate()
  • array_aggr()

All of these work in exactly the same way, so you can choose which one you want to use.

3. The median() Function

As alluded to earlier, we can still use the median() function if we need to. If we unnest the list, then we can use it on the unnested list values:

SELECT median(l)
FROM unnest([ 10, 7, 25, 3 ]) AS t(l);

Result:

8.5

While this is a slightly more convoluted way of getting the median, it might suit your needs, depending on the situation. If you’re already unnesting the list for other purposes, then maybe it becomes an option for you.

Here’s the same query without the median() function:

SELECT l
FROM unnest([ 10, 7, 25, 3 ]) AS t(l);

Result:

+----+
| l |
+----+
| 10 |
| 7 |
| 25 |
| 3 |
+----+

So that’s the column that median() was operating on. When we unnest a list that doesn’t contain other lists, the list’s values are returned in tabular format. That’s why we’re able to use a regular aggregate function like median() against it.