In SQL Server, PERCENTILE_DISC()
is a window function that returns a percentile value based on a discrete distribution of the input column. Basically, it returns the first value in the set whose ordered position is the same or more than the specified fraction.
The output of PERCENTILE_DISC()
is equal to a specific column value (unlike the PERCENTILE_CONT()
function, which could calculate a value that isn’t in the column).
When we call PERCENTILE_DISC()
we specify the percentile to use. It then performs its calculation based on that percentile.
Syntax
The syntax goes like this:
PERCENTILE_DISC ( numeric_literal )
WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
OVER ( [ <partition_by_clause> ] )
So numeric_literal
is in the range between 0.0
and 1.0
. This (required) value is the percentile to compute.
We must provide a WITHIN GROUP
clause with an ORDER BY
clause.
We must also provide an OVER
clause.
Example
Here’s a basic example to demonstrate:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.5 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
) AS PERCENTILE_DISC
FROM Products;
Result:
VendorId ProductName ProductPrice PERCENTILE_DISC -------- ------------------------------- ------------ --------------- 1001 Long Weight (green) 11.99 14.75 1001 Long Weight (blue) 14.75 14.75 1001 Left handed screwdriver 25.99 14.75 1001 Right handed screwdriver 25.99 14.75 1002 Sledge Hammer 33.49 33.49 1003 Hammock 10 55.99 1003 Straw Dog Box 55.99 55.99 1003 Chainsaw 245 55.99 1004 Bottomless Coffee Mugs (4 Pack) 9.99 9.99 1004 Tea Pot 12.45 9.99
In this example, we used an argument of 0.5
in order to calculate the median price of all products from each vendor. We used the WITHIN GROUP
clause to sort the set by product price, and we used the OVER
clause with a PARTITION BY
clause to partition the results by vendor.
The PERCENTILE_DISC()
function always returns an actual value from the data set. Therefore, the results could deviate significantly from what we might initially expect, especially when there’s a small number of rows (like in our example), or if the set doesn’t contain any values in the middle of the range.
The PERCENTILE_CONT()
function on the other hand, doesn’t necessarily return a value from the actual data set. It will interpolate between adjacent input items if necessarily to return the median value regardless of whether that value is in the data set or not.
Another Example
Here’s an example that uses a different percentile:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
) AS PERCENTILE_DISC
FROM Products;
Result:
VendorId ProductName ProductPrice PERCENTILE_DISC -------- ------------------------------- ------------ --------------- 1001 Long Weight (green) 11.99 25.99 1001 Long Weight (blue) 14.75 25.99 1001 Left handed screwdriver 25.99 25.99 1001 Right handed screwdriver 25.99 25.99 1002 Sledge Hammer 33.49 33.49 1003 Hammock 10 245 1003 Straw Dog Box 55.99 245 1003 Chainsaw 245 245 1004 Bottomless Coffee Mugs (4 Pack) 9.99 12.45 1004 Tea Pot 12.45 12.45
In this example I used a percentile of 0.8
instead of 0.5
, and we can see how this has affected the output.
The OVER
Clause is Required
As mentioned, we must provide the OVER
clause when using PERCENTILE_DISC()
. Here’s what happens if we remove it:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
) AS PERCENTILE_DISC
FROM Products;
Result:
Msg 10753, Level 15, State 3, Line 5 The function 'PERCENTILE_DISC' must have an OVER clause.
About the ORDER BY
Clause
The ORDER BY
clause cannot be in the OVER
clause when using the PERCENTILE_DISC()
function. This makes it (and PERCENTILE_CONT()
) a bit different to most other window functions, which require an ORDER BY
clause in the OVER
clause.
That said, we can see from the above example that the ORDER BY
clause goes into the WITHIN GROUP
clause when using PERCENTILE_DISC()
.
Here’s what happens if we put an ORDER BY
clause in the OVER
clause:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
ORDER BY ProductPrice
) AS PERCENTILE_DISC
FROM Products;
Result:
Msg 10758, Level 15, State 1, Line 5 The function 'PERCENTILE_DISC' may not have ORDER BY in OVER clause.
Another thing about the ORDER BY
clause when using PERCENTILE_DISC()
is that only one ORDER BY
expression is allowed.
Here’s what happens when we provide more than one ORDER BY
expression:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.5 )
WITHIN GROUP (
ORDER BY ProductPrice, ProductName
)
OVER (
PARTITION BY VendorId
) AS PERCENTILE_DISC
FROM Products;
Result:
Msg 10751, Level 15, State 1, Line 5 The ORDER BY in WITHIN GROUP clause of 'PERCENTILE_DISC' function must have exactly one expression.
The error tells us that the function must only have one expression.
No Window Frame Allowed
Another thing we can’t do when using the PERCENTILE_DISC()
function is specify a window frame. By this I mean we can’t use the ROWS
or RANGE
clause like we can in most other window functions.
Here’s what happens when we try to specify a window frame:
SELECT
VendorId,
ProductName,
ProductPrice,
PERCENTILE_DISC( 0.8 )
WITHIN GROUP (
ORDER BY ProductPrice
)
OVER (
PARTITION BY VendorId
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS PERCENTILE_DISC
FROM Products;
Result:
Msg 10752, Level 15, State 1, Line 5 The function 'PERCENTILE_DISC' may not have a window frame.
The same applies to the PERCENTILE_CONT()
function.
More Information
For more information and examples, see Microsoft’s documentation for the PERCENTILE_DISC()
function.