3 Functions to Calculate the Median in SQL

Most RDBMSs have functions that make it easy to calculate the median value from a column in our queries.

Below are three SQL functions we can use to calculate the median value in a column.

The MEDIAN() Function

Some DBMSs (such as MariaDB, Oracle Database, DB2, Snowflake, Amazon Redshift) provide a MEDIAN() function specifically for calculating the median value. If your DBMS has a MEDIAN() function, then your work is done. All you need to do is pass the name of the column for which you want the median value calculated.

Here’s an example I ran in MariaDB:

SELECT
  VendorId,
  ProductName,
  ProductPrice,
  MEDIAN(ProductPrice) OVER(PARTITION BY VendorId) AS "Median"
FROM Products;

Result:

VendorId	ProductName	                ProductPrice	Median
--------------  ------------------------------  --------------  -------------
1001	        Left handed screwdriver	        25.99	        20.3700000000
1001	        Right handed screwdriver	25.99	        20.3700000000
1001	        Long Weight (blue)	        14.75	        20.3700000000
1001	        Long Weight (green)	        11.99	        20.3700000000
1002	        Sledge Hammer	                33.49	        33.4900000000
1003	        Chainsaw	                245.00	        55.9900000000
1003	        Straw Dog Box	                55.99	        55.9900000000
1003	        Hammock	                        10.00	        55.9900000000
1004	        Tea Pot	                        12.45	        11.2200000000
1004	        Bottomless Coffee Mugs (4 Pack)	9.99	        11.2200000000

You may have noticed that the median value returned is not necessarily in the actual underlying data. That’s because the MEDIAN() function returns an interpolated value if required. In other words, it calculates the median and returns that value, regardless of whether that value is actually in the underlying data.

We can see an example of this with the first vendor. Here, the median column contains 20.3700000000, but that value isn’t actually in the data set. Same with the last vendor. Its median column contains 11.2200000000, but that value isn’t in the underlying data.

If you don’t want an interpolated value, then use the PERCENTILE_DISC() function if your DBMS has it (example of this function later).

The PERCENTILE_CONT() Function

If your DBMS doesn’t have a MEDIAN() function, perhaps it has a PERCENTILE_CONT() function. If it does, you can use it to calculate the median value.

The PERCENTILE_CONT() function accepts a numeric argument between 0.0 and 1.0. Providing an argument of 0.5 is what we need to do to get the median.

Here’s an example I ran in SQL Server:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS PERCENTILE_CONT
FROM Products;

Result:

VendorId  ProductName                      ProductPrice  PERCENTILE_CONT   
--------  -------------------------------  ------------  ------------------
1001      Long Weight (green)              11.99         20.369999999999997
1001      Long Weight (blue)               14.75         20.369999999999997
1001      Left handed screwdriver          25.99         20.369999999999997
1001      Right handed screwdriver         25.99         20.369999999999997
1002      Sledge Hammer                    33.49         33.49             
1003      Hammock                          10            55.99             
1003      Straw Dog Box                    55.99         55.99             
1003      Chainsaw                         245           55.99             
1004      Bottomless Coffee Mugs (4 Pack)  9.99          11.219999999999999
1004      Tea Pot                          12.45         11.219999999999999

An argument of 0.5 was used to calculate the median price of all products from each vendor. This is important – any other value and it wouldn’t be the median.

I used the WITHIN GROUP clause to sort the set by product price, and I used the OVER clause with a PARTITION BY clause to partition the results by vendor. Removing the PARTITION BY clause will calculate the median value across the whole data set.

Like the MEDIAN() function above, the PERCENTILE_CONT() function doesn’t necessarily return a value from the actual data set. It takes the percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification.

As an Aggregate Function

Depending on your DBMS, you may or may not be able to run PERCENTILE_CONT() as a window function (like in the above example). In PostgreSQL for example, the PERCENTILE_CONT() function is implemented as an aggregate function, but not as a window function, and it doesn’t support the OVER clause (at least, not at the time of writing).

Here’s an example of using the PERCENTILE_CONT() function in PostgreSQL:

SELECT
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        ) AS Median
FROM Products;

Result:

       median       
--------------------
 20.369999999999997

So in this case it’s used as an aggregate function to return the median value of the whole data set.

Of course, we can also narrow it down to just a single vendor:

SELECT
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        ) AS Median
FROM Products
WHERE VendorId = 1004;

Result:

       median       
--------------------
 11.219999999999999

And with a bit of work, you may be able to modify the query to return results similar to the window function.

Some DBMSs (such as DB2, Oracle Database, SingleStoreDB) support the OVER clause as optional. In this case, we have the choice of using PERCENTILE_CONT() as a window function or aggregate function.

DBMSs that implement a PERCENTILE_CONT() function include SQL Server, Oracle Database, PostgreSQL, MariaDB, DB2, Snowflake, Amazon Redshift, SingleStoreDB, and more.

The PERCENTILE_DISC() Function

If your RDBMS supports the PERCENTILE_CONT() function, it most likely also includes a PERCENTILE_DISC() function that works in a similar way to PERCENTILE_CONT(), except that it always returns an actual value from the data set.

Example:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_DISC( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS PERCENTILE_DISC
FROM Products;

Result:

VendorId  ProductName                      ProductPrice  PERCENTILE_DISC
--------  -------------------------------  ------------  ---------------
1001      Long Weight (green)              11.99         14.75          
1001      Long Weight (blue)               14.75         14.75          
1001      Left handed screwdriver          25.99         14.75          
1001      Right handed screwdriver         25.99         14.75          
1002      Sledge Hammer                    33.49         33.49          
1003      Hammock                          10            55.99          
1003      Straw Dog Box                    55.99         55.99          
1003      Chainsaw                         245           55.99          
1004      Bottomless Coffee Mugs (4 Pack)  9.99          9.99           
1004      Tea Pot                          12.45         9.99           

Here, I used the same code from the previous example, except that I switched PERCENTILE_CONT() to PERCENTILE_DISC(). We can see that PERCENTILE_DISC() doesn’t interpolate any values. It simply uses whatever values are already in the data set.

What is the Median Value?

The median is the value in the middle of a data set, meaning that half of the values are smaller or equal to the median and half are higher or equal to the median. This is not to be confused with the average value, which can be vastly different to the median. To calculate the average, use the AVG() function instead.