Overview of the PERCENTILE_CONT() Function in SQL

Some relational database management systems (RDBMSs) have a PERCENTILE_CONT() function that calculates a percentile based on a continuous distribution across a range of column values.

We specify the percentile to use when we call the function.

Depending on the RDBMS, PERCENTILE_CONT() can be used as a window function or an aggregate function, or as both.

Example

In this example, PERCENTILE_CONT() is being used as a window function:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

VendorId  ProductName                      ProductPrice  PERCENTILE_CONT   
--------  -------------------------------  ------------  ------------------
1001      Long Weight (green)              11.99         20.369999999999997
1001      Long Weight (blue)               14.75         20.369999999999997
1001      Left handed screwdriver          25.99         20.369999999999997
1001      Right handed screwdriver         25.99         20.369999999999997
1002      Sledge Hammer                    33.49         33.49             
1003      Hammock                          10            55.99             
1003      Straw Dog Box                    55.99         55.99             
1003      Chainsaw                         245           55.99             
1004      Bottomless Coffee Mugs (4 Pack)  9.99          11.219999999999999
1004      Tea Pot                          12.45         11.219999999999999

I ran that example in Azure SQL Edge, which uses the SQL Server database engine.

In this example, I used an argument of 0.5 in order to calculate the median price of all products from each vendor. I used the WITHIN GROUP clause to sort the set by product price, and I used the OVER clause with a PARTITION BY clause to partition the results by vendor.

By including the OVER clause, we’re using PERCENTILE_CONT() as a window function. We could omit the OVER clause to turn it into an aggregate function, but as mentioned, this will depend on your RDBMS. Some DBMSs might not allow you to omit the OVER clause, while others won’t even accept it and will only allow you to use PERCENTILE_CONT() as an aggregate function.

The PERCENTILE_CONT() function doesn’t necessarily return a value from the actual data set. We can see an example of this with the first vendor. Here, the PERCENTILE_CONT column contains 20.369999999999997, but that value isn’t actually in the data set. It’s the same with the last vendor. We get 11.219999999999999, but that value isn’t in the underlying data.

Most (if not all) SQL databases that provide a PERCENTILE_CONT() function also provide a PERCENTILE_DISC() function that works in a similar way to PERCENTILE_CONT(), except that it always returns an actual value from the data set.

Changing the Percentile

Here’s an example that uses a different percentile:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.8 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

VendorId  ProductName                      ProductPrice  PERCENTILE_CONT   
--------  -------------------------------  ------------  ------------------
1001      Long Weight (green)              11.99         25.99             
1001      Long Weight (blue)               14.75         25.99             
1001      Left handed screwdriver          25.99         25.99             
1001      Right handed screwdriver         25.99         25.99             
1002      Sledge Hammer                    33.49         33.49             
1003      Hammock                          10            169.39600000000002
1003      Straw Dog Box                    55.99         169.39600000000002
1003      Chainsaw                         245           169.39600000000002
1004      Bottomless Coffee Mugs (4 Pack)  9.99          11.958            
1004      Tea Pot                          12.45         11.958            

Here, I used a percentile of 0.8 instead of 0.5. This changed the output for vendor 1003 quite significantly, but not so much for the others.

Omitting the OVER Clause

Omitting the OVER clause when using PERCENTILE_CONT() may result in an error. It all depends on your RDBMS. Some RDBMSs allow us to omit the OVER clause, while others don’t. Other RDBMSs don’t even allow us to use an OVER clause.

Here’s what happens if we remove the OVER clause in SQL Server:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.8 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

Msg 10753, Level 15, State 3, Line 5
The function 'PERCENTILE_CONT' must have an OVER clause.

And here’s the error message that MariaDB returns:

ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "PERCENTILE_CONT"
FROM Products' at line 8

However, other RDBMSs allow us to remove the OVER clause, so that we can run the function as an aggregate function.

About the ORDER BY Clause

Typically, when using PERCENTILE_CONT() as a window function, we put the ORDER BY sub-clause in the WITHIN GROUP clause instead of the OVER clause. You may get an error if you try to put the ORDER BY sub-clause in the OVER clause.

Here’s what happens if we put an ORDER BY clause in the OVER clause in SQL Server:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.8 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
            ORDER BY ProductPrice
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

Msg 10758, Level 15, State 1, Line 5
The function 'PERCENTILE_CONT' may not have ORDER BY in OVER clause.

You may also find that only one ORDER BY expression is allowed.

Here’s what happens when we provide more than one ORDER BY expression in SQL Server:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.5 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice, ProductName
        )
        OVER ( 
            PARTITION BY VendorId
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

Msg 10751, Level 15, State 1, Line 5
The ORDER BY in WITHIN GROUP clause of 'PERCENTILE_CONT' function must have exactly one expression.

The error tells us that the function must only have one expression.

No Window Frame Allowed

You may also find that window frames are not allowed when using the PERCENTILE_CONT() function.

Here’s what happens when we try to specify a window frame in SQL Server:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    PERCENTILE_CONT( 0.8 ) 
        WITHIN GROUP (
            ORDER BY ProductPrice
        )
        OVER ( 
            PARTITION BY VendorId
            RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS "PERCENTILE_CONT"
FROM Products;

Result:

Msg 10752, Level 15, State 1, Line 5
The function 'PERCENTILE_CONT' may not have a window frame.

And here’s what MariaDB returns:

ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS "PERCE...' at line 11

In SQL, window frames are typically specified with a line that starts with ROWS or RANGE. They specify the window for which the window function should operate on.