Understanding the PERCENT_RANK() Function in MySQL

In MySQL, PERCENT_RANK() is a window function that returns the percentage of partition values less than the value in the current row, excluding the highest value.

We can use PERCENT_RANK() to evaluate the relative standing of a value within a query result set or partition. Return values range from 0 to 1.

Syntax

The syntax goes like this:

PERCENT_RANK() over_clause

So it requires an OVER clause, which determines how the rowset is partitioned and ordered before the window function is applied.

Example

Suppose we have a table with the following data:

SELECT * FROM Idiots;

Result:

+---------+-------------------+------+
| IdiotId | IdiotName         | IQ   |
+---------+-------------------+------+
|       1 | Dumb              |   60 |
|       2 | Dumber            |   50 |
|       3 | Dumbest           |   30 |
|       4 | Homer             |   40 |
|       5 | Peter Griffin     |   50 |
|       6 | Patrick Star      |   40 |
|       7 | Ed                |   40 |
|       8 | Ralph Wiggum      |   65 |
|       9 | Cosmo             |   55 |
|      10 | Richard Watterson |   70 |
+---------+-------------------+------+

The following query uses PERCENT_RANK() against the IQ column:

SELECT 
    IdiotName,
    IQ,
    PERCENT_RANK() OVER ( ORDER BY IQ ) AS "Percentage Rank"
FROM Idiots;

Result:

+-------------------+------+--------------------+
| IdiotName         | IQ   | Percentage Rank    |
+-------------------+------+--------------------+
| Dumbest           |   30 |                  0 |
| Homer             |   40 | 0.1111111111111111 |
| Patrick Star      |   40 | 0.1111111111111111 |
| Ed                |   40 | 0.1111111111111111 |
| Dumber            |   50 | 0.4444444444444444 |
| Peter Griffin     |   50 | 0.4444444444444444 |
| Cosmo             |   55 | 0.6666666666666666 |
| Dumb              |   60 | 0.7777777777777778 |
| Ralph Wiggum      |   65 | 0.8888888888888888 |
| Richard Watterson |   70 |                  1 |
+-------------------+------+--------------------+

We can see that the percentage rank ranges between 0 and 1, with all other values in between.

The values represent the row relative rank, calculated as the result of the following formula:

(rank - 1) / (rows - 1)

where rank is the row rank and rows is the number of partition rows.

Omitting the ORDER BY Clause

We should use the PERCENT_RANK() function with an ORDER BY clause if we want meaningful results. Omitting the ORDER BY clause will cause all rows to be peers (and therefore they’ll all share the same percentage rank).

Here’s what happens when we omit the ORDER BY clause:

SELECT 
    IdiotName,
    IQ,
    PERCENT_RANK() OVER () AS "Percentage Rank"
FROM Idiots;

Result:

+-------------------+------+-----------------+
| IdiotName         | IQ   | Percentage Rank |
+-------------------+------+-----------------+
| Dumb              |   60 |               0 |
| Dumber            |   50 |               0 |
| Dumbest           |   30 |               0 |
| Homer             |   40 |               0 |
| Peter Griffin     |   50 |               0 |
| Patrick Star      |   40 |               0 |
| Ed                |   40 |               0 |
| Ralph Wiggum      |   65 |               0 |
| Cosmo             |   55 |               0 |
| Richard Watterson |   70 |               0 |
+-------------------+------+-----------------+

All rows contain a percentage rank of 0.

Partitions

We can use the PARTITION BY clause to partition the rowset by a given column. When we do this, PERCENT_RANK() calculates the percentage rank of the row within each partition. That is, the value is evaluated against the other rows in the same partition.

Suppose we have the following table:

SELECT * FROM Dogs;

Result:

+---------+-------------+-------+
| DogName | Activity    | Score |
+---------+-------------+-------+
| Bruno   | Fetch Stick |    43 |
| Cooper  | Fetch Stick |    67 |
| Max     | Fetch Stick |    91 |
| Bruno   | Wag Tail    |    51 |
| Cooper  | Wag Tail    |    51 |
| Max     | Wag Tail    |    87 |
| Bruno   | Keep Quiet  |     1 |
| Cooper  | Keep Quiet  |     8 |
| Max     | Keep Quiet  |    12 |
+---------+-------------+-------+

We can use a PARTITION BY clause to find out the percentage rank of each dog within each activity:

SELECT 
    DogName,
    Activity,
    Score,
    PERCENT_RANK() OVER ( PARTITION BY Activity ORDER BY Score ) AS "Percentage Rank"
FROM Dogs;

Result:

+---------+-------------+-------+-----------------+
| DogName | Activity    | Score | Percentage Rank |
+---------+-------------+-------+-----------------+
| Bruno   | Fetch Stick |    43 |               0 |
| Cooper  | Fetch Stick |    67 |             0.5 |
| Max     | Fetch Stick |    91 |               1 |
| Bruno   | Keep Quiet  |     1 |               0 |
| Cooper  | Keep Quiet  |     8 |             0.5 |
| Max     | Keep Quiet  |    12 |               1 |
| Bruno   | Wag Tail    |    51 |               0 |
| Cooper  | Wag Tail    |    51 |               0 |
| Max     | Wag Tail    |    87 |               1 |
+---------+-------------+-------+-----------------+

We can see that the values returned by PERCENT_RANK() are either 0, 0.5, or 1. That’s because there are only three rows in each partition.

Here’s an example that demonstrates the range of results we can get with other similar functions:

SELECT 
    DogName,
    Activity,
    Score,
    PERCENT_RANK() OVER win AS "Percentage Rank",
    CUME_DIST() OVER win AS "Cumulative Distribution",
    RANK() OVER win AS "Rank",
    DENSE_RANK() OVER win AS "Dense Rank"
FROM Dogs
WINDOW win AS (PARTITION BY Activity ORDER BY Score);

Result:

+---------+-------------+-------+-----------------+-------------------------+------+------------+
| DogName | Activity    | Score | Percentage Rank | Cumulative Distribution | Rank | Dense Rank |
+---------+-------------+-------+-----------------+-------------------------+------+------------+
| Bruno   | Fetch Stick |    43 |               0 |      0.3333333333333333 |    1 |          1 |
| Cooper  | Fetch Stick |    67 |             0.5 |      0.6666666666666666 |    2 |          2 |
| Max     | Fetch Stick |    91 |               1 |                       1 |    3 |          3 |
| Bruno   | Keep Quiet  |     1 |               0 |      0.3333333333333333 |    1 |          1 |
| Cooper  | Keep Quiet  |     8 |             0.5 |      0.6666666666666666 |    2 |          2 |
| Max     | Keep Quiet  |    12 |               1 |                       1 |    3 |          3 |
| Bruno   | Wag Tail    |    51 |               0 |      0.6666666666666666 |    1 |          1 |
| Cooper  | Wag Tail    |    51 |               0 |      0.6666666666666666 |    1 |          1 |
| Max     | Wag Tail    |    87 |               1 |                       1 |    3 |          2 |
+---------+-------------+-------+-----------------+-------------------------+------+------------+

In this example I moved the PARTITION BY and ORDER BY clauses to a named window. This saved me from having to replicate the same clauses multiple times in the SELECT list.