Understanding the CUME_DIST() Function in SQL

Many relational database management systems (RDBMSs) provide us with a range of window functions.

The CUME_DIST() function is a window function that’s commonly implemented in RDBMSs for the purpose of calculating the cumulative distribution across a data set. In other words, it calculates the relative position of a specified value in a group of values.

Example

Here’s a basic example that demonstrates how the SQL CUME_DIST() function works:

SELECT
    VendorId,
    ProductName,
    ProductPrice,
    CUME_DIST( ) OVER ( 
        ORDER BY ProductPrice 
        ) AS "CUME_DIST"
FROM Products;

Result:

+----------+---------------------------------+--------------+-----------+
| VendorId | ProductName                     | ProductPrice | CUME_DIST |
+----------+---------------------------------+--------------+-----------+
|     1004 | Bottomless Coffee Mugs (4 Pack) |         9.99 |       0.1 |
|     1003 | Hammock                         |        10.00 |       0.2 |
|     1001 | Long Weight (green)             |        11.99 |       0.3 |
|     1004 | Tea Pot                         |        12.45 |       0.4 |
|     1001 | Long Weight (blue)              |        14.75 |       0.5 |
|     1001 | Left handed screwdriver         |        25.99 |       0.7 |
|     1001 | Right handed screwdriver        |        25.99 |       0.7 |
|     1002 | Sledge Hammer                   |        33.49 |       0.8 |
|     1003 | Straw Dog Box                   |        55.99 |       0.9 |
|     1003 | Chainsaw                        |       245.00 |         1 |
+----------+---------------------------------+--------------+-----------+

Here, the CUME_DIST column displays the cumulative distribution of the ProductPrice column when sorted in ascending order.

We can see that the values range from 0.1 to 1. We can also see that two products have the same price (the left and right handed screwdrivers), and so they share the same cumulative distribution.

When using window functions in SQL, we use an OVER clause to specify the logical order in which the operation occurs, and to divide the FROM clause result set into partitions if required.

Partitioning the Result Set

As mentioned, we can divide the FROM clause result set into partitions if required. We do this by adding a PARTITION BY clause to the OVER clause.

For example, we can partition the above result set by vendor:

SELECT
    v.VendorName,
    p.ProductName,
    p.ProductPrice,
    CUME_DIST() OVER ( 
        PARTITION BY p.VendorId
        ORDER BY p.ProductPrice 
        ) AS "CUME_DIST"
FROM Products p 
INNER JOIN Vendors v
ON v.VendorId = p.VendorId;

Result:

+---------------+---------------------------------+--------------+--------------------+
| VendorName    | ProductName                     | ProductPrice | CUME_DIST          |
+---------------+---------------------------------+--------------+--------------------+
| Mars Supplies | Long Weight (green)             |        11.99 |               0.25 |
| Mars Supplies | Long Weight (blue)              |        14.75 |                0.5 |
| Mars Supplies | Left handed screwdriver         |        25.99 |                  1 |
| Mars Supplies | Right handed screwdriver        |        25.99 |                  1 |
| Randy Roofers | Sledge Hammer                   |        33.49 |                  1 |
| Pedal Medals  | Hammock                         |        10.00 | 0.3333333333333333 |
| Pedal Medals  | Straw Dog Box                   |        55.99 | 0.6666666666666666 |
| Pedal Medals  | Chainsaw                        |       245.00 |                  1 |
| Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |                0.5 |
| Katty Kittens | Tea Pot                         |        12.45 |                  1 |
+---------------+---------------------------------+--------------+--------------------+

Here we get the cumulative distribution within each partition. In this case, the CUME_DIST column resets with each new partition.

Using a Named Window

Many RDBMSs enable us to define the window in a named window. This allows us to move the window specification away from the SELECT list, and define it later in the query. It also allows us to refer to the same window definition multiple times, without having to rewrite the definition.

To use a named window, we add a WINDOW clause near the end of the query, and then refer to that named window from the SELECT list:

SELECT
    v.VendorName,
    p.ProductName,
    p.ProductPrice,
    CUME_DIST() OVER win AS "CUME_DIST"
FROM Products p 
INNER JOIN Vendors v
ON v.VendorId = p.VendorId
WINDOW win AS ( 
        PARTITION BY p.VendorId
        ORDER BY p.ProductPrice 
        );

Result:

+---------------+---------------------------------+--------------+--------------------+
| VendorName    | ProductName                     | ProductPrice | CUME_DIST          |
+---------------+---------------------------------+--------------+--------------------+
| Mars Supplies | Long Weight (green)             |        11.99 |               0.25 |
| Mars Supplies | Long Weight (blue)              |        14.75 |                0.5 |
| Mars Supplies | Left handed screwdriver         |        25.99 |                  1 |
| Mars Supplies | Right handed screwdriver        |        25.99 |                  1 |
| Randy Roofers | Sledge Hammer                   |        33.49 |                  1 |
| Pedal Medals  | Hammock                         |        10.00 | 0.3333333333333333 |
| Pedal Medals  | Straw Dog Box                   |        55.99 | 0.6666666666666666 |
| Pedal Medals  | Chainsaw                        |       245.00 |                  1 |
| Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |                0.5 |
| Katty Kittens | Tea Pot                         |        12.45 |                  1 |
+---------------+---------------------------------+--------------+--------------------+

The above examples were done in MySQL 8.0.33.

SQL Server introduced the WINDOW clause in SQL Server 2022, so you can only use the WINDOW clause if your version is SQL Server 2022 or later.