Some SQL databases have a PERCENTILE_DISC()
function that calculates a percentile based on a discrete distribution of a range of column values.
We specify the percentile to use when we call the function.
Depending on the DBMS, PERCENTILE_DISC()
can be used as a window function, as an aggregate function, or as both.
The PERCENTILE_DISC()
function always returns a value from the underlying data. This is in contrast to the PERCENTILE_CONT()
function, which can interpolate between adjacent values to return a value that’s not in the underlying data.
Example
In this example, PERCENTILE_DISC()
is being used as a window function:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.5 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
) AS "PERCENTILE_DISC"
FROM Products;
Result:
VendorId ProductName ProductPrice PERCENTILE_DISC -------- ------------------------------- ------------ --------------- 1001 Long Weight (green) 11.99 14.75 1001 Long Weight (blue) 14.75 14.75 1001 Left handed screwdriver 25.99 14.75 1001 Right handed screwdriver 25.99 14.75 1002 Sledge Hammer 33.49 33.49 1003 Hammock 10 55.99 1003 Straw Dog Box 55.99 55.99 1003 Chainsaw 245 55.99 1004 Bottomless Coffee Mugs (4 Pack) 9.99 9.99 1004 Tea Pot 12.45 9.99
In this example, I used an argument of 0.5
in order to calculate the median price of all products from each vendor. I used the WITHIN GROUP
clause to sort the set by product price, and I used the OVER
clause with a PARTITION BY
clause to partition the results by vendor.
By including the OVER
clause, we’re using PERCENTILE_DISC()
as a window function. We could omit the OVER
clause to turn it into an aggregate function, but as mentioned, this will depend on your DBMS. Some DBMSs might not allow you to omit the OVER
clause, while others won’t accept it and will only allow you to use PERCENTILE_DISC()
as an aggregate function.
Most (if not all) SQL databases that provide a PERCENTILE_DISC()
function also provide a PERCENTILE_CONT()
function that works in a similar way to PERCENTILE_DISC()
, except that PERCENTILE_CONT()
can interpolate between adjacent values to return a value that’s not in the underlying data.
Changing the Percentile
Here’s an example that uses a different percentile:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
) AS "PERCENTILE_DISC"
FROM Products;
Result:
VendorId ProductName ProductPrice PERCENTILE_DISC -------- ------------------------------- ------------ --------------- 1001 Long Weight (green) 11.99 25.99 1001 Long Weight (blue) 14.75 25.99 1001 Left handed screwdriver 25.99 25.99 1001 Right handed screwdriver 25.99 25.99 1002 Sledge Hammer 33.49 33.49 1003 Hammock 10 245 1003 Straw Dog Box 55.99 245 1003 Chainsaw 245 245 1004 Bottomless Coffee Mugs (4 Pack) 9.99 12.45 1004 Tea Pot 12.45 12.45
In this example I used a percentile of 0.8
instead of 0.5
, which changed the result.
Omitting the OVER
Clause
Omitting the OVER
clause when using PERCENTILE_DISC()
may result in an error. It all depends on your DBMS. Some DBMSs allow us to omit the OVER
clause, while others don’t. Other DBMSs don’t even allow us to use an OVER
clause.
Here’s what happens if we remove the OVER
clause in SQL Server:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
) AS "PERCENTILE_DISC"
FROM Products;
Result:
Msg 10753, Level 15, State 3, Line 5 The function 'PERCENTILE_DISC' must have an OVER clause.
And here’s the error message that MariaDB returns:
ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'AS "PERCENTILE_DISC" FROM Products' at line 8
However, other DBMSs allow us to remove the OVER
clause, so that we can run the function as an aggregate function.
About the ORDER BY
Clause
Typically, when using PERCENTILE_DISC()
as a window function, we put the ORDER BY
sub-clause in the WITHIN GROUP
clause instead of the OVER
clause. You may get an error if you try to put the ORDER BY
sub-clause in the OVER
clause.
Here’s what happens if we put an ORDER BY
clause in the OVER
clause in SQL Server:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
ORDER BY ProductPrice
) AS "PERCENTILE_DISC"
FROM Products;
Result:
Msg 10758, Level 15, State 1, Line 5 The function 'PERCENTILE_DISC' may not have ORDER BY in OVER clause.
You may also find that, even when you place the ORDER BY
clause in the right place, it only accepts one expression.
Here’s what happens when we provide more than one expression to the ORDER BY
clause in SQL Server:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.5 )
WITHIN GROUP (
ORDER BY ProductPrice, ProductName
)
OVER (
PARTITION BY VendorId
) AS "PERCENTILE_DISC"
FROM Products;
Result:
Msg 10751, Level 15, State 1, Line 5 The ORDER BY in WITHIN GROUP clause of 'PERCENTILE_DISC' function must have exactly one expression.
The error tells us that the function can only have one expression in the ORDER BY
clause.
No Window Frame Allowed
You may also find that window frames are not allowed when using the PERCENTILE_DISC()
function.
Here’s what happens when we try to specify a window frame in SQL Server:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS "PERCENTILE_DISC"
FROM Products;
Result:
Msg 10752, Level 15, State 1, Line 5 The function 'PERCENTILE_DISC' may not have a window frame.
And here’s what MariaDB returns:
ERROR 1064 (42000) at line 30: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS "PERCE...' at line 11
In SQL, window frames are typically specified with a line that starts with ROWS
or RANGE
. They specify the window for which the window function should operate on.