Many relational database management systems (RDBMSs) provide a window function called PERCENT_RANK()
that returns the relative rank of a row within a group of rows.
The relative rank is expressed as a value between 0 and 1.
Example
Here’s an example that demonstrates how the PERCENT_RANK()
function works in SQL:
SELECT
IdiotName,
IQ,
PERCENT_RANK() OVER ( ORDER BY IQ ) AS "Percentage Rank"
FROM Idiots;
Result:
+-------------------+------+--------------------+ | IdiotName | IQ | Percentage Rank | +-------------------+------+--------------------+ | Dumbest | 30 | 0 | | Homer | 40 | 0.1111111111111111 | | Patrick Star | 40 | 0.1111111111111111 | | Ed | 40 | 0.1111111111111111 | | Dumber | 50 | 0.4444444444444444 | | Peter Griffin | 50 | 0.4444444444444444 | | Cosmo | 55 | 0.6666666666666666 | | Dumb | 60 | 0.7777777777777778 | | Ralph Wiggum | 65 | 0.8888888888888888 | | Richard Watterson | 70 | 1 | +-------------------+------+--------------------+
In this example, the Percentage Rank
column contains the output of the PERCENT_RANK()
function. We can see that its values range between 0 and 1 inclusive. Specifically, the first row is 0
and the last row is 1
, with all other values falling between that range.
As with any window function, the PERCENT_RANK()
function requires an OVER
clause, which determines how the rowset is partitioned and ordered before the window function is applied.
Omitting the ORDER BY
Clause
Our OVER
clause should have an ORDER BY
clause so that the PERCENT_RANK()
knows how to rank the values. We’ll usually encounter problems if we forget to include an ORDER BY
clause in the OVER
clause.
Some DBMSs (such as SQL Server) will throw an error if the OVER
clause doesn’t include an ORDER BY
clause.
If your DBMS doesn’t throw an error (i.e. it allows you to omit the ORDER BY
clause), the function will probably treat all rows as peers, and they’ll all share the same percentage rank.
Here’s what happens when we omit the ORDER BY
clause in MySQL:
SELECT
IdiotName,
IQ,
PERCENT_RANK() OVER () AS "Percentage Rank"
FROM Idiots;
Result:
+-------------------+------+-----------------+ | IdiotName | IQ | Percentage Rank | +-------------------+------+-----------------+ | Dumb | 60 | 0 | | Dumber | 50 | 0 | | Dumbest | 30 | 0 | | Homer | 40 | 0 | | Peter Griffin | 50 | 0 | | Patrick Star | 40 | 0 | | Ed | 40 | 0 | | Ralph Wiggum | 65 | 0 | | Cosmo | 55 | 0 | | Richard Watterson | 70 | 0 | +-------------------+------+-----------------+
All rows contain a percentage rank of 0
.
Here’s what happens when we run the same code in SQL Server:
Msg 4112, Level 15, State 1, Line 4 The function 'PERCENT_RANK' must have an OVER clause with ORDER BY.
It tells us that we must have an ORDER BY
clause in the OVER
clause.
Partitions
We can use the PARTITION BY
clause to partition the rowset by a given column. When we do this, PERCENT_RANK()
calculates the relative rank of the row within each partition. In other words, each value is evaluated against the other rows in the same partition, but not the other partitions.
Here’s an example to demonstrate:
SELECT
DogName,
Activity,
Score,
PERCENT_RANK() OVER ( PARTITION BY Activity ORDER BY Score ) AS "Percentage Rank"
FROM Dogs;
Result:
+---------+-------------+-------+-----------------+ | DogName | Activity | Score | Percentage Rank | +---------+-------------+-------+-----------------+ | Bruno | Fetch Stick | 43 | 0 | | Cooper | Fetch Stick | 67 | 0.5 | | Max | Fetch Stick | 91 | 1 | | Max | Keep Quiet | 1 | 0 | | Bruno | Keep Quiet | 2 | 0.5 | | Cooper | Keep Quiet | 8 | 1 | | Cooper | Wag Tail | 51 | 0 | | Bruno | Wag Tail | 65 | 0.5 | | Max | Wag Tail | 87 | 1 | +---------+-------------+-------+-----------------+
This table contains dogs and their scores at certain activities. There are three activities (listed in the Activity
column).
In this example, we use a PARTITION BY
clause to find out the relative rank of each dog within each activity.
In this case, the values returned by PERCENT_RANK()
are either 0
, 0.5
, or 1
. That’s because there are only three rows in each partition. Larger data sets will usually return a larger range of results within each partition.