In MySQL, PERCENT_RANK()
is a window function that returns the percentage of partition values less than the value in the current row, excluding the highest value.
We can use PERCENT_RANK()
to evaluate the relative standing of a value within a query result set or partition. Return values range from 0 to 1.
Syntax
The syntax goes like this:
PERCENT_RANK() over_clause
So it requires an OVER
clause, which determines how the rowset is partitioned and ordered before the window function is applied.
Example
Suppose we have a table with the following data:
SELECT * FROM Idiots;
Result:
+---------+-------------------+------+ | IdiotId | IdiotName | IQ | +---------+-------------------+------+ | 1 | Dumb | 60 | | 2 | Dumber | 50 | | 3 | Dumbest | 30 | | 4 | Homer | 40 | | 5 | Peter Griffin | 50 | | 6 | Patrick Star | 40 | | 7 | Ed | 40 | | 8 | Ralph Wiggum | 65 | | 9 | Cosmo | 55 | | 10 | Richard Watterson | 70 | +---------+-------------------+------+
The following query uses PERCENT_RANK()
against the IQ column:
SELECT
IdiotName,
IQ,
PERCENT_RANK() OVER ( ORDER BY IQ ) AS "Percentage Rank"
FROM Idiots;
Result:
+-------------------+------+--------------------+ | IdiotName | IQ | Percentage Rank | +-------------------+------+--------------------+ | Dumbest | 30 | 0 | | Homer | 40 | 0.1111111111111111 | | Patrick Star | 40 | 0.1111111111111111 | | Ed | 40 | 0.1111111111111111 | | Dumber | 50 | 0.4444444444444444 | | Peter Griffin | 50 | 0.4444444444444444 | | Cosmo | 55 | 0.6666666666666666 | | Dumb | 60 | 0.7777777777777778 | | Ralph Wiggum | 65 | 0.8888888888888888 | | Richard Watterson | 70 | 1 | +-------------------+------+--------------------+
We can see that the percentage rank ranges between 0 and 1, with all other values in between.
The values represent the row relative rank, calculated as the result of the following formula:
(rank - 1) / (rows - 1)
where rank
is the row rank and rows
is the number of partition rows.
Omitting the ORDER BY
Clause
We should use the PERCENT_RANK()
function with an ORDER BY
clause if we want meaningful results. Omitting the ORDER BY
clause will cause all rows to be peers (and therefore they’ll all share the same percentage rank).
Here’s what happens when we omit the ORDER BY
clause:
SELECT
IdiotName,
IQ,
PERCENT_RANK() OVER () AS "Percentage Rank"
FROM Idiots;
Result:
+-------------------+------+-----------------+ | IdiotName | IQ | Percentage Rank | +-------------------+------+-----------------+ | Dumb | 60 | 0 | | Dumber | 50 | 0 | | Dumbest | 30 | 0 | | Homer | 40 | 0 | | Peter Griffin | 50 | 0 | | Patrick Star | 40 | 0 | | Ed | 40 | 0 | | Ralph Wiggum | 65 | 0 | | Cosmo | 55 | 0 | | Richard Watterson | 70 | 0 | +-------------------+------+-----------------+
All rows contain a percentage rank of 0
.
Partitions
We can use the PARTITION BY
clause to partition the rowset by a given column. When we do this, PERCENT_RANK()
calculates the percentage rank of the row within each partition. That is, the value is evaluated against the other rows in the same partition.
Suppose we have the following table:
SELECT * FROM Dogs;
Result:
+---------+-------------+-------+ | DogName | Activity | Score | +---------+-------------+-------+ | Bruno | Fetch Stick | 43 | | Cooper | Fetch Stick | 67 | | Max | Fetch Stick | 91 | | Bruno | Wag Tail | 51 | | Cooper | Wag Tail | 51 | | Max | Wag Tail | 87 | | Bruno | Keep Quiet | 1 | | Cooper | Keep Quiet | 8 | | Max | Keep Quiet | 12 | +---------+-------------+-------+
We can use a PARTITION BY
clause to find out the percentage rank of each dog within each activity:
SELECT
DogName,
Activity,
Score,
PERCENT_RANK() OVER ( PARTITION BY Activity ORDER BY Score ) AS "Percentage Rank"
FROM Dogs;
Result:
+---------+-------------+-------+-----------------+ | DogName | Activity | Score | Percentage Rank | +---------+-------------+-------+-----------------+ | Bruno | Fetch Stick | 43 | 0 | | Cooper | Fetch Stick | 67 | 0.5 | | Max | Fetch Stick | 91 | 1 | | Bruno | Keep Quiet | 1 | 0 | | Cooper | Keep Quiet | 8 | 0.5 | | Max | Keep Quiet | 12 | 1 | | Bruno | Wag Tail | 51 | 0 | | Cooper | Wag Tail | 51 | 0 | | Max | Wag Tail | 87 | 1 | +---------+-------------+-------+-----------------+
We can see that the values returned by PERCENT_RANK()
are either 0
, 0.5
, or 1
. That’s because there are only three rows in each partition.
Here’s an example that demonstrates the range of results we can get with other similar functions:
SELECT
DogName,
Activity,
Score,
PERCENT_RANK() OVER win AS "Percentage Rank",
CUME_DIST() OVER win AS "Cumulative Distribution",
RANK() OVER win AS "Rank",
DENSE_RANK() OVER win AS "Dense Rank"
FROM Dogs
WINDOW win AS (PARTITION BY Activity ORDER BY Score);
Result:
+---------+-------------+-------+-----------------+-------------------------+------+------------+ | DogName | Activity | Score | Percentage Rank | Cumulative Distribution | Rank | Dense Rank | +---------+-------------+-------+-----------------+-------------------------+------+------------+ | Bruno | Fetch Stick | 43 | 0 | 0.3333333333333333 | 1 | 1 | | Cooper | Fetch Stick | 67 | 0.5 | 0.6666666666666666 | 2 | 2 | | Max | Fetch Stick | 91 | 1 | 1 | 3 | 3 | | Bruno | Keep Quiet | 1 | 0 | 0.3333333333333333 | 1 | 1 | | Cooper | Keep Quiet | 8 | 0.5 | 0.6666666666666666 | 2 | 2 | | Max | Keep Quiet | 12 | 1 | 1 | 3 | 3 | | Bruno | Wag Tail | 51 | 0 | 0.6666666666666666 | 1 | 1 | | Cooper | Wag Tail | 51 | 0 | 0.6666666666666666 | 1 | 1 | | Max | Wag Tail | 87 | 1 | 1 | 3 | 2 | +---------+-------------+-------+-----------------+-------------------------+------+------------+
In this example I moved the PARTITION BY
and ORDER BY
clauses to a named window. This saved me from having to replicate the same clauses multiple times in the SELECT
list.