Some relational database management systems (RDBMSs) have a MEDIAN()
function that calculates the median value within a range of values.
It returns the middle value or an interpolated value that would be the middle value once the values are sorted.
The SQL MEDIAN()
function typically operates on numeric expressions, but depending on the RDBMS, may also operate on other data types, such as datetime expressions.
Depending on the RDBMS, the MEDIAN()
function can be used as a window function, as an aggregate function, or as both.
The median is the value in the middle of a data set, meaning that half of the values are smaller or equal to the median and half are higher or equal to the median. This is not to be confused with the average value, which can be vastly different to the median. To calculate the average, use the AVG()
function instead.
If your RDBMS doesn’t include a MEDIAN()
function, see the last example in this article for a possible alternative.
Example
The following example demonstrates how the SQL MEDIAN()
function works when used as a window function:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) OVER(PARTITION BY VendorId ORDER BY ProductPrice) AS "Median"
FROM Products;
Result:
VendorId ProductName ProductPrice Median -------------- ------------------------------ -------------- ------------- 1001 Left handed screwdriver 25.99 20.3700000000 1001 Right handed screwdriver 25.99 20.3700000000 1001 Long Weight (blue) 14.75 20.3700000000 1001 Long Weight (green) 11.99 20.3700000000 1002 Sledge Hammer 33.49 33.4900000000 1003 Chainsaw 245.00 55.9900000000 1003 Straw Dog Box 55.99 55.9900000000 1003 Hammock 10.00 55.9900000000 1004 Tea Pot 112.45 76.2200000000 1004 Bottomless Coffee Mugs (4 Pack) 39.99 76.2200000000
I ran this example in MariaDB, which includes MEDIAN()
as a window function.
In this example, I partitioned the results by the VendorId
column. This means that the function calculates the median within each partition (i.e. it calculates the median product price for each vendor).
If necessary, MEDIAN()
interpolates between values to produce a value that’s not actually in the data set. We can see an example of this with the first vendor. In that case, the Median
column contains 20.3700000000
, but that value isn’t actually in the underlying data.
On the other hand, if we look at vendor 1003, we can see that the median value ends up being the same as one of the values from that vendor.
Vendor 1002 only has one product, and so the median value matches that product’s price.
Removing the PARTITION BY
Clause
If we remove the PARTITION BY
clause, we’ll get the median value across all vendors:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) OVER() AS "Median"
FROM Products;
Result:
VendorId ProductName ProductPrice Median
1003 Chainsaw 245.00 20.3700000000
1003 Straw Dog Box 55.99 20.3700000000
1002 Sledge Hammer 33.49 20.3700000000
1001 Left handed screwdriver 25.99 20.3700000000
1001 Right handed screwdriver 25.99 20.3700000000
1001 Long Weight (blue) 14.75 20.3700000000
1004 Tea Pot 12.45 20.3700000000
1001 Long Weight (green) 11.99 20.3700000000
1003 Hammock 10.00 20.3700000000
1004 Bottomless Coffee Mugs (4 Pack) 9.99 20.3700000000
Running MEDIAN()
an Aggregate Function
Depending on your RDBMS, you may be able to run the MEDIAN()
function as an aggregate function. Some RDBMSs only implement MEDIAN()
as an aggregate function (not accepting the OVER
clause).
If your RDBMS allows you to run MEDIAN()
as an aggregate function, you’ll be able to remove the OVER
clause.
Here’s an example of running MEDIAN()
as an aggregate function in Oracle:
SELECT
VendorId,
MEDIAN(ProductPrice)
FROM Products
GROUP BY VendorId
ORDER BY VendorId;
Result:
VENDORID MEDIAN(PRODUCTPRICE) 1001 20.37 1002 33.49 1003 55.99 1004 11.22
In this case, we used the GROUP BY
clause to group by the same column that we had previously partitioned by using the PARTITION BY
clause.
As mentioned, some RDBMSs don’t allow us to use MEDIAN()
as an aggregate function. In such cases, the OVER
clause is required. Removing it results in an error. Here’s what happens if we remove the OVER
clause in MariaDB:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) AS "Median"
FROM Products;
Result:
ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "Median" FROM Products' at line 5
More specifically, that’s the error that I got in MariaDB 10.7.8. Later versions may allow MEDIAN()
to be run as an aggregate function, and so it’s always best to check your RDBMS’s documentation for up to date information.
A MEDIAN()
Alternative
If your RDBMS doesn’t provide a MEDIAN()
function, there may still be hope. Many RDBMSs include a PERCENTILE_CONT()
function that can be used to do the same thing.
We can get the same result by using the PERCENTILE_CONT()
function with an argument of 0.5
and the ORDER BY
column being the same argument provided to MEDIAN()
.
Therefore, we can replace the first example in this article with something like this:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_CONT( 0.5 )
WITHIN GROUP (
ORDER BY ProductPrice DESC
)
OVER (
PARTITION BY VendorId
) AS "Median"
FROM Products;
Result:
VendorId ProductName ProductPrice Median -------------- ------------------------------ -------------- ------------- 1001 Left handed screwdriver 25.99 20.3700000000 1001 Right handed screwdriver 25.99 20.3700000000 1001 Long Weight (blue) 14.75 20.3700000000 1001 Long Weight (green) 11.99 20.3700000000 1002 Sledge Hammer 33.49 33.4900000000 1003 Chainsaw 245.00 55.9900000000 1003 Straw Dog Box 55.99 55.9900000000 1003 Hammock 10.00 55.9900000000 1004 Tea Pot 12.45 11.2200000000 1004 Bottomless Coffee Mugs (4 Pack) 9.99 11.2200000000
This obviously assumes that your RDBMS includes the PERCENTILE_CONT()
function, but it does seem to be more widely implemented in SQL databases than the MEDIAN()
function.