An Introduction to the MEDIAN() Function in SQL

Some relational database management systems (RDBMSs) have a MEDIAN() function that calculates the median value within a range of values.

It returns the middle value or an interpolated value that would be the middle value once the values are sorted.

The SQL MEDIAN() function typically operates on numeric expressions, but depending on the RDBMS, may also operate on other data types, such as datetime expressions.

Depending on the RDBMS, the MEDIAN() function can be used as a window function, as an aggregate function, or as both.

The median is the value in the middle of a data set, meaning that half of the values are smaller or equal to the median and half are higher or equal to the median. This is not to be confused with the average value, which can be vastly different to the median. To calculate the average, use the AVG() function instead.

If your RDBMS doesn’t include a MEDIAN() function, see the last example in this article for a possible alternative.

Example

The following example demonstrates how the SQL MEDIAN() function works when used as a window function:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) OVER(PARTITION BY VendorId ORDER BY ProductPrice) AS "Median"
FROM Products;

Result:

VendorId	ProductName	                ProductPrice	Median
--------------  ------------------------------  --------------  -------------
1001	        Left handed screwdriver	        25.99	        20.3700000000
1001	        Right handed screwdriver	25.99	        20.3700000000
1001	        Long Weight (blue)	        14.75	        20.3700000000
1001	        Long Weight (green)	        11.99	        20.3700000000
1002	        Sledge Hammer	                33.49	        33.4900000000
1003	        Chainsaw	                245.00	        55.9900000000
1003	        Straw Dog Box	                55.99	        55.9900000000
1003	        Hammock	                        10.00	        55.9900000000
1004	        Tea Pot	                        112.45	        76.2200000000
1004	        Bottomless Coffee Mugs (4 Pack)	39.99	        76.2200000000

I ran this example in MariaDB, which includes MEDIAN() as a window function.

In this example, I partitioned the results by the VendorId column. This means that the function calculates the median within each partition (i.e. it calculates the median product price for each vendor).

If necessary, MEDIAN() interpolates between values to produce a value that’s not actually in the data set. We can see an example of this with the first vendor. In that case, the Median column contains 20.3700000000, but that value isn’t actually in the underlying data.

On the other hand, if we look at vendor 1003, we can see that the median value ends up being the same as one of the values from that vendor.

Vendor 1002 only has one product, and so the median value matches that product’s price.

Removing the PARTITION BY Clause

If we remove the PARTITION BY clause, we’ll get the median value across all vendors:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) OVER() AS "Median"
FROM Products;

Result:

VendorId	ProductName	                 ProductPrice	Median
1003	        Chainsaw	                 245.00	        29.7400000000
1003	        Straw Dog Box	                 55.99	        29.7400000000
1002	        Sledge Hammer	                 33.49	        29.7400000000
1001	        Left handed screwdriver	         25.99	        29.7400000000
1001	        Right handed screwdriver	 25.99	        29.7400000000
1001	        Long Weight (blue)	         14.75	        29.7400000000
1004	        Tea Pot	                         12.45	        29.7400000000
1001	        Long Weight (green)	         11.99	        29.7400000000
1003	        Hammock	                         10.00	        29.7400000000
1004	        Bottomless Coffee Mugs (4 Pack)	  9.99	        29.7400000000

Running MEDIAN() an Aggregate Function

Depending on your RDBMS, you may be able to run the MEDIAN() function as an aggregate function. Some RDBMSs only implement MEDIAN() as an aggregate function (not accepting the OVER clause).

If your RDBMS allows you to run MEDIAN() as an aggregate function, you’ll be able to remove the OVER clause.

Here’s an example of running MEDIAN() as an aggregate function in Oracle:

SELECT
  VendorId,
  MEDIAN(ProductPrice)
FROM Products
GROUP BY VendorId
ORDER BY VendorId;

Result:

VENDORID	MEDIAN(PRODUCTPRICE)
1001	        20.37
1002	        33.49
1003	        55.99
1004	        11.22

In this case, we used the GROUP BY clause to group by the same column that we had previously partitioned by using the PARTITION BY clause.

As mentioned, some RDBMSs don’t allow us to use MEDIAN() as an aggregate function. In such cases, the OVER clause is required. Removing it results in an error. Here’s what happens if we remove the OVER clause in MariaDB:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) AS "Median"
FROM Products;

Result:

ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "Median"
FROM Products' at line 5

More specifically, that’s the error that I got in MariaDB 10.7.8. Later versions may allow MEDIAN() to be run as an aggregate function, and so it’s always best to check your RDBMS’s documentation for up to date information.

A MEDIAN() Alternative

If your RDBMS doesn’t provide a MEDIAN() function, there may still be hope. Many RDBMSs include a PERCENTILE_CONT() function that can be used to do the same thing.

We can get the same result by using the PERCENTILE_CONT() function with an argument of 0.5 and the ORDER BY column being the same argument provided to MEDIAN().

Therefore, we can replace the first example in this article with something like this:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice DESC
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS "Median"
FROM Products;

Result:

VendorId	ProductName	                ProductPrice	Median
--------------  ------------------------------  --------------  -------------
1001	        Left handed screwdriver	        25.99	        20.3700000000
1001	        Right handed screwdriver	25.99	        20.3700000000
1001	        Long Weight (blue)	        14.75	        20.3700000000
1001	        Long Weight (green)	        11.99	        20.3700000000
1002	        Sledge Hammer	                33.49	        33.4900000000
1003	        Chainsaw	                245.00	        55.9900000000
1003	        Straw Dog Box	                55.99	        55.9900000000
1003	        Hammock	                        10.00	        55.9900000000
1004	        Tea Pot	                        12.45	        11.2200000000
1004	        Bottomless Coffee Mugs (4 Pack)	9.99	        11.2200000000

This obviously assumes that your RDBMS includes the PERCENTILE_CONT() function, but it does seem to be more widely implemented in SQL databases than the MEDIAN() function.