An Introduction to the PERCENT_RANK() Function in SQL

Many relational database management systems (RDBMSs) provide a window function called PERCENT_RANK() that returns the relative rank of a row within a group of rows.

The relative rank is expressed as a value between 0 and 1.

Example

Here’s an example that demonstrates how the PERCENT_RANK() function works in SQL:

SELECT 
    IdiotName,
    IQ,
    PERCENT_RANK() OVER ( ORDER BY IQ ) AS "Percentage Rank"
FROM Idiots;

Result:

+-------------------+------+--------------------+
| IdiotName         | IQ   | Percentage Rank    |
+-------------------+------+--------------------+
| Dumbest           |   30 |                  0 |
| Homer             |   40 | 0.1111111111111111 |
| Patrick Star      |   40 | 0.1111111111111111 |
| Ed                |   40 | 0.1111111111111111 |
| Dumber            |   50 | 0.4444444444444444 |
| Peter Griffin     |   50 | 0.4444444444444444 |
| Cosmo             |   55 | 0.6666666666666666 |
| Dumb              |   60 | 0.7777777777777778 |
| Ralph Wiggum      |   65 | 0.8888888888888888 |
| Richard Watterson |   70 |                  1 |
+-------------------+------+--------------------+

In this example, the Percentage Rank column contains the output of the PERCENT_RANK() function. We can see that its values range between 0 and 1 inclusive. Specifically, the first row is 0 and the last row is 1, with all other values falling between that range.

As with any window function, the PERCENT_RANK() function requires an OVER clause, which determines how the rowset is partitioned and ordered before the window function is applied.

Omitting the ORDER BY Clause

Our OVER clause should have an ORDER BY clause so that the PERCENT_RANK() knows how to rank the values. We’ll usually encounter problems if we forget to include an ORDER BY clause in the OVER clause.

Some DBMSs (such as SQL Server) will throw an error if the OVER clause doesn’t include an ORDER BY clause.

If your DBMS doesn’t throw an error (i.e. it allows you to omit the ORDER BY clause), the function will probably treat all rows as peers, and they’ll all share the same percentage rank.

Here’s what happens when we omit the ORDER BY clause in MySQL:

SELECT 
    IdiotName,
    IQ,
    PERCENT_RANK() OVER () AS "Percentage Rank"
FROM Idiots;

Result:

+-------------------+------+-----------------+
| IdiotName         | IQ   | Percentage Rank |
+-------------------+------+-----------------+
| Dumb              |   60 |               0 |
| Dumber            |   50 |               0 |
| Dumbest           |   30 |               0 |
| Homer             |   40 |               0 |
| Peter Griffin     |   50 |               0 |
| Patrick Star      |   40 |               0 |
| Ed                |   40 |               0 |
| Ralph Wiggum      |   65 |               0 |
| Cosmo             |   55 |               0 |
| Richard Watterson |   70 |               0 |
+-------------------+------+-----------------+

All rows contain a percentage rank of 0.

Here’s what happens when we run the same code in SQL Server:

Msg 4112, Level 15, State 1, Line 4
The function 'PERCENT_RANK' must have an OVER clause with ORDER BY.

It tells us that we must have an ORDER BY clause in the OVER clause.

Partitions

We can use the PARTITION BY clause to partition the rowset by a given column. When we do this, PERCENT_RANK() calculates the relative rank of the row within each partition. In other words, each value is evaluated against the other rows in the same partition, but not the other partitions.

Here’s an example to demonstrate:

SELECT 
    DogName,
    Activity,
    Score,
    PERCENT_RANK() OVER ( PARTITION BY Activity ORDER BY Score ) AS "Percentage Rank"
FROM Dogs;

Result:

+---------+-------------+-------+-----------------+
| DogName | Activity    | Score | Percentage Rank |
+---------+-------------+-------+-----------------+
| Bruno   | Fetch Stick |    43 |               0 |
| Cooper  | Fetch Stick |    67 |             0.5 |
| Max     | Fetch Stick |    91 |               1 |
| Max     | Keep Quiet  |     1 |               0 |
| Bruno   | Keep Quiet  |     2 |             0.5 |
| Cooper  | Keep Quiet  |     8 |               1 |
| Cooper  | Wag Tail    |    51 |               0 |
| Bruno   | Wag Tail    |    65 |             0.5 |
| Max     | Wag Tail    |    87 |               1 |
+---------+-------------+-------+-----------------+

This table contains dogs and their scores at certain activities. There are three activities (listed in the Activity column).

In this example, we use a PARTITION BY clause to find out the relative rank of each dog within each activity.

In this case, the values returned by PERCENT_RANK() are either 0, 0.5, or 1. That’s because there are only three rows in each partition. Larger data sets will usually return a larger range of results within each partition.