In MariaDB, the MEDIAN()
function returns the median value of a range of values.
We can use the MEDIAN()
function in our queries to get a column’s median value across its partition or the whole result set.
The median is the value in the middle of a data set, meaning that half of the values are smaller or equal to the median and half are higher or equal to the median. This is not to be confused with the average value, which can be vastly different to the median. To calculate the average, use the AVG()
function instead.
The MEDIAN()
function is equivalent to using the PERCENTILE_CONT()
function with an argument of 0.5
and the ORDER BY
column being the argument provided to MEDIAN()
.
Syntax
The syntax for the MEDIAN()
function goes like this:
MEDIAN(median expression) OVER (
[ PARTITION BY partition_expression ]
)
We’d typically use a column name for the median expression
argument, and the function will then return the median value in each partition.
We can see that the MEDIAN()
function requires an OVER
clause, with an optional PARTITION BY
clause. If we omit the PARTITION BY
clause, we’ll get the median value for the whole result set.
The MEDIAN()
function is equivalent to the following PERCENTILE_CONT()
syntax:
PERCENTILE_CONT(0.5) WITHIN
GROUP (ORDER BY <median-arg>) OVER ( [ PARTITION BY partition_expression ])
So we can get the same result using either function.
Example
Here’s an example to demonstrate how MEDIAN()
works:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) OVER(PARTITION BY VendorId) AS "Median"
FROM Products;
Result:
VendorId ProductName ProductPrice Median -------------- ------------------------------ -------------- ------------- 1001 Left handed screwdriver 25.99 20.3700000000 1001 Right handed screwdriver 25.99 20.3700000000 1001 Long Weight (blue) 14.75 20.3700000000 1001 Long Weight (green) 11.99 20.3700000000 1002 Sledge Hammer 33.49 33.4900000000 1003 Chainsaw 245.00 55.9900000000 1003 Straw Dog Box 55.99 55.9900000000 1003 Hammock 10.00 55.9900000000 1004 Tea Pot 112.45 76.2200000000 1004 Bottomless Coffee Mugs (4 Pack) 39.99 76.2200000000
If necessary, MEDIAN()
interpolates between values to produce a value that’s not actually in the data set. We can see an example of this with the first vendor. In that case, the Median
column contains 20.3700000000
, but that value isn’t actually in the underlying data.
On the other hand, if we look at vendor 1003, we can see that the median value ends up being the same as one of the values from that vendor.
Vendor 1002 only has one product, and so the median value matches that product’s price.
Removing the PARTITION BY
Clause
Here’s what happens if we remove the PARTITION BY
clause:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) OVER() AS "Median"
FROM Products;
Result:
VendorId ProductName ProductPrice Median
1003 Chainsaw 245.00 20.3700000000
1003 Straw Dog Box 55.99 20.3700000000
1002 Sledge Hammer 33.49 20.3700000000
1001 Left handed screwdriver 25.99 20.3700000000
1001 Right handed screwdriver 25.99 20.3700000000
1001 Long Weight (blue) 14.75 20.3700000000
1004 Tea Pot 12.45 20.3700000000
1001 Long Weight (green) 11.99 20.3700000000
1003 Hammock 10.00 20.3700000000
1004 Bottomless Coffee Mugs (4 Pack) 9.99 20.3700000000
We get the median value for the whole result set.
Removing the OVER
Clause
The OVER
clause is required. Here’s what happens if we remove it:
SELECT
VendorId,
ProductName,
ProductPrice,
MEDIAN(ProductPrice) AS "Median"
FROM Products;
Result:
ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "Median" FROM Products' at line 5