Overview of the CUME_DIST() Function in SQL Server

In SQL Server, the CUME_DIST() function calculates and returns the cumulative distribution of a value within a group of values. This is the relative position of a specified value in a group of values.

Syntax

The syntax goes like this:

CUME_DIST( )  
    OVER ( [ partition_by_clause ] order_by_clause )

So it doesn’t require or accept any arguments, but it requires an OVER clause with at least an ORDER BY clause.

Example

Here’s an example of using the CUME_DIST() function:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    CUME_DIST( ) OVER ( 
        ORDER BY ProductPrice 
        ) AS CUME_DIST
FROM Products;

Result:

VendorId  ProductName                      ProductPrice  CUME_DIST
--------  -------------------------------  ------------  ---------
1004      Bottomless Coffee Mugs (4 Pack)  9.99          0.1      
1003      Hammock                          10            0.2      
1001      Long Weight (green)              11.99         0.3      
1004      Tea Pot                          12.45         0.4      
1001      Long Weight (blue)               14.75         0.5      
1001      Left handed screwdriver          25.99         0.7      
1001      Right handed screwdriver         25.99         0.7      
1002      Sledge Hammer                    33.49         0.8      
1003      Straw Dog Box                    55.99         0.9      
1003      Chainsaw                         245           1        

Here, the CUME_DIST column displays the cumulative distribution of the ProductPrice column when sorted in ascending order.

We can see that the values range from 0.1 to 1. We can also see that two products have the same price (the left and right handed screwdrivers), and so they share the same cumulative distribution.

Partitioning the Result Set

We can use a PARTITION BY clause in order to partition the result set by a given column. For example, we can partition by vendor:

SELECT
    v.VendorName,
    p.ProductName,
    p.ProductPrice,
    CUME_DIST( ) OVER ( 
        PARTITION BY p.VendorId
        ORDER BY p.ProductPrice 
        ) AS CUME_DIST
FROM Products p 
INNER JOIN Vendors v
ON v.VendorId = p.VendorId;

Result:

VendorName     ProductName                      ProductPrice  CUME_DIST         
-------------  -------------------------------  ------------  ------------------
Mars Supplies  Long Weight (green)              11.99         0.25              
Mars Supplies  Long Weight (blue)               14.75         0.5               
Mars Supplies  Left handed screwdriver          25.99         1                 
Mars Supplies  Right handed screwdriver         25.99         1                 
Randy Roofers  Sledge Hammer                    33.49         1                 
Pedal Medals   Hammock                          10            0.3333333333333333
Pedal Medals   Straw Dog Box                    55.99         0.6666666666666666
Pedal Medals   Chainsaw                         245           1                 
Katty Kittens  Bottomless Coffee Mugs (4 Pack)  9.99          0.5               
Katty Kittens  Tea Pot                          12.45         1                 

Here we get the cumulative distribution within each partition. Therefore the CUME_DIST column resets with each new partition.

The OVER Clause is Required

We must provide the OVER clause when using the CUME_DIST() function. Here’s what happens if we remove it:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    CUME_DIST( ) AS CUME_DIST
FROM Products;

Result:

Error: The function 'CUME_DIST' must have an OVER clause.

The ORDER BY Clause is Required

Furthermore, the OVER clause must have at least an ORDER BY clause. Here’s what happens when we include an OVER clause without an ORDER BY clause:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    CUME_DIST( ) OVER ( ) AS CUME_DIST
FROM Products;

Result:

Error: The function 'CUME_DIST' must have an OVER clause with ORDER BY.