Overview of the MEDIAN() Function in MariaDB

In MariaDB, the MEDIAN() function returns the median value of a range of values.

We can use the MEDIAN() function in our queries to get a column’s median value across its partition or the whole result set.

The median is the value in the middle of a data set, meaning that half of the values are smaller or equal to the median and half are higher or equal to the median. This is not to be confused with the average value, which can be vastly different to the median. To calculate the average, use the AVG() function instead.

The MEDIAN() function is equivalent to using the PERCENTILE_CONT() function with an argument of 0.5 and the ORDER BY column being the argument provided to MEDIAN().

Syntax

The syntax for the MEDIAN() function goes like this:

MEDIAN(median expression) OVER (
  [ PARTITION BY partition_expression ] 
)

We’d typically use a column name for the median expression argument, and the function will then return the median value in each partition.

We can see that the MEDIAN() function requires an OVER clause, with an optional PARTITION BY clause. If we omit the PARTITION BY clause, we’ll get the median value for the whole result set.

The MEDIAN() function is equivalent to the following PERCENTILE_CONT() syntax:

PERCENTILE_CONT(0.5) WITHIN 
  GROUP (ORDER BY <median-arg>) OVER ( [ PARTITION BY partition_expression ])

So we can get the same result using either function.

Example

Here’s an example to demonstrate how MEDIAN() works:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) OVER(PARTITION BY VendorId) AS "Median"
FROM Products;

Result:

VendorId	ProductName	                ProductPrice	Median
--------------  ------------------------------  --------------  -------------
1001	        Left handed screwdriver	        25.99	        20.3700000000
1001	        Right handed screwdriver	25.99	        20.3700000000
1001	        Long Weight (blue)	        14.75	        20.3700000000
1001	        Long Weight (green)	        11.99	        20.3700000000
1002	        Sledge Hammer	                33.49	        33.4900000000
1003	        Chainsaw	                245.00	        55.9900000000
1003	        Straw Dog Box	                55.99	        55.9900000000
1003	        Hammock	                        10.00	        55.9900000000
1004	        Tea Pot	                        112.45	        76.2200000000
1004	        Bottomless Coffee Mugs (4 Pack)	39.99	        76.2200000000

If necessary, MEDIAN() interpolates between values to produce a value that’s not actually in the data set. We can see an example of this with the first vendor. In that case, the Median column contains 20.3700000000, but that value isn’t actually in the underlying data.

On the other hand, if we look at vendor 1003, we can see that the median value ends up being the same as one of the values from that vendor.

Vendor 1002 only has one product, and so the median value matches that product’s price.

Removing the PARTITION BY Clause

Here’s what happens if we remove the PARTITION BY clause:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) OVER() AS "Median"
FROM Products;

Result:

VendorId	ProductName	                 ProductPrice	Median
1003 Chainsaw 245.00 20.3700000000
1003 Straw Dog Box 55.99 20.3700000000
1002 Sledge Hammer 33.49 20.3700000000
1001 Left handed screwdriver 25.99 20.3700000000
1001 Right handed screwdriver 25.99 20.3700000000
1001 Long Weight (blue) 14.75 20.3700000000
1004 Tea Pot 12.45 20.3700000000
1001 Long Weight (green) 11.99 20.3700000000
1003 Hammock 10.00 20.3700000000
1004 Bottomless Coffee Mugs (4 Pack) 9.99 20.3700000000

We get the median value for the whole result set.

Removing the OVER Clause

The OVER clause is required. Here’s what happens if we remove it:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) AS "Median"
FROM Products;

Result:

ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "Median"
FROM Products' at line 5