In SQL databases, the RANK()
function is a window function that returns the rank of the current row within its partition, with gaps.
By “with gaps” I mean that if there are any ties for a given rank, there will be a gap between that rank value and the next rank value.
If you don’t want such gaps, use the DENSE_RANK()
function instead, as it returns the rank without gaps.
Example
Here’s an example of a SQL query that includes the RANK()
function:
SELECT
IdiotName,
IQ,
RANK() OVER ( ORDER BY IQ ) AS "Rank"
FROM Idiots;
Result:
+-------------------+------+------+ | IdiotName | IQ | Rank | +-------------------+------+------+ | Dumbest | 30 | 1 | | Homer | 40 | 2 | | Patrick Star | 40 | 2 | | Ed | 40 | 2 | | Dumber | 50 | 5 | | Peter Griffin | 50 | 5 | | Cosmo | 55 | 7 | | Dumb | 60 | 8 | | Ralph Wiggum | 65 | 9 | | Richard Watterson | 70 | 10 | +-------------------+------+------+ 10 rows in set (0.00 sec)
Here we use the SQL RANK()
function to rank the values in the IQ
column.
We use an OVER
clause to determine how the rows are sorted before ranking the values. In this case, we sorted them by the IQ
column in ascending order. We could also partition the rows with the OVER
clause if we wanted (example later).
We have two separate instances of a tie in the above example. Three idiots have a rank of 2
, and two idiots have a rank of 5
. These idiots share the same rank because they share the same IQ.
Note that there are gaps after the tied rows. The rankings are non-contiguous. For example, the rank values jump from 2
to 5
(there’s no 3
or 4
). This is by design. It’s exactly how the SQL RANK()
function works.
If you don’t want such gaps, use the DENSE_RANK()
function instead (it returns the rank without gaps).
The following example demonstrates the difference between these two functions:
SELECT
IdiotName,
IQ,
RANK() OVER win AS "Rank",
DENSE_RANK() OVER win AS "Dense Rank"
FROM Idiots
WINDOW win AS ( ORDER BY IQ );
Result:
+-------------------+------+------+------------+ | IdiotName | IQ | Rank | Dense Rank | +-------------------+------+------+------------+ | Dumbest | 30 | 1 | 1 | | Homer | 40 | 2 | 2 | | Patrick Star | 40 | 2 | 2 | | Ed | 40 | 2 | 2 | | Dumber | 50 | 5 | 3 | | Peter Griffin | 50 | 5 | 3 | | Cosmo | 55 | 7 | 4 | | Dumb | 60 | 8 | 5 | | Ralph Wiggum | 65 | 9 | 6 | | Richard Watterson | 70 | 10 | 7 | +-------------------+------+------+------------+ 10 rows in set (0.00 sec)
We can see that these two functions can return quite different results, depending on whether there are any ties, and how many there are. On the other hand, if there were no ties, they would’ve returned the same result.
In this example I moved the ORDER BY
clause to a named window. This allowed me to use the same ORDER BY
clause twice (saving me from having to type it out twice). Instead, all we need to do is provide the name of the named window (in this case, win
).
Your ability to use a named window may depend on your DBMS. For example, at the time of writing, SQL Server has only recently introduced this capability (in SQL Server 2022). Trying to use the WINDOW
clause in earlier versions will result in an error.
Reorder the Rankings
In the above examples, the idiot with the lowest IQ gets the highest rank (assuming a rank of 1 is higher than 2, etc).
We can change it so that the highest IQ gets the highest rank. To do this, all we need to do is use the DESC
keyword, to indicate descending order:
SELECT
IdiotName,
IQ,
RANK() OVER win AS "Rank"
FROM Idiots
WINDOW win AS ( ORDER BY IQ DESC );
Result:
+-------------------+------+------+ | IdiotName | IQ | Rank | +-------------------+------+------+ | Richard Watterson | 70 | 1 | | Ralph Wiggum | 65 | 2 | | Dumb | 60 | 3 | | Cosmo | 55 | 4 | | Dumber | 50 | 5 | | Peter Griffin | 50 | 5 | | Homer | 40 | 7 | | Patrick Star | 40 | 7 | | Ed | 40 | 7 | | Dumbest | 30 | 10 | +-------------------+------+------+ 10 rows in set (0.00 sec)
Omitting the ORDER BY
Clause
As shown in the above examples, the ORDER BY
clause allows us to get meaningful results. Omitting the ORDER BY
clause causes all rows to be peers, which will cause them all to share the same rank. This kind of defeats the purpose of using the RANK()
function in the first place.
Here’s an example that demonstrates what I mean:
SELECT
IdiotName,
IQ,
RANK() OVER () AS "Rank"
FROM Idiots;
Result:
+-------------------+------+------+ | IdiotName | IQ | Rank | +-------------------+------+------+ | Dumb | 60 | 1 | | Dumber | 50 | 1 | | Dumbest | 30 | 1 | | Homer | 40 | 1 | | Peter Griffin | 50 | 1 | | Patrick Star | 40 | 1 | | Ed | 40 | 1 | | Ralph Wiggum | 65 | 1 | | Cosmo | 55 | 1 | | Richard Watterson | 70 | 1 | +-------------------+------+------+ 10 rows in set (0.12 sec)
Now all idiots are ranked number 1
, which isn’t very helpful. We all know that some idiots are more idiotic than others!
Some DBMSs (such as SQL Server) return an error if we omit the ORDER BY
clause.
Here’s the error we get when we omit the ORDER BY
clause in SQL Server:
Msg 4112, Level 15, State 1, Line 5 The function 'RANK' must have an OVER clause with ORDER BY.
Partitions
We can apply the RANK()
function against each partition in a result set if we want. We can do this with the PARTITION BY
clause.
When we do this, each value is only evaluated against the other rows in the same partition.
Example:
SELECT
DogName,
Activity,
Score,
RANK() OVER ( PARTITION BY Activity ORDER BY Score ) AS "Rank"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Rank | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | 1 | | Cooper | Fetch Stick | 67 | 2 | | Max | Fetch Stick | 91 | 3 | | Max | Keep Quiet | 1 | 1 | | Bruno | Keep Quiet | 2 | 2 | | Cooper | Keep Quiet | 8 | 3 | | Cooper | Wag Tail | 51 | 1 | | Bruno | Wag Tail | 65 | 2 | | Max | Wag Tail | 87 | 3 | +---------+-------------+-------+------+ 9 rows in set (0.00 sec)
In this table, we have dog names and their respective scores at performing various activities.
In this case, I’ve partitioned the results by the Activity
column. There are three activities, and so we get three partitions. This means that the dogs are ranked against each activity. The ranking values reset with each partition, and so the rank starts at 1
again at the start of each partition.
Similar Window Functions in SQL
Here’s an example that compares the RANK()
function to other similar SQL functions:
SELECT
DogName,
Activity,
Score,
RANK() OVER win AS "Rank",
DENSE_RANK() OVER win AS "Dense Rank",
PERCENT_RANK() OVER win AS "Percentage Rank",
CUME_DIST() OVER win AS "Cumulative Distribution"
FROM Dogs
WINDOW win AS (PARTITION BY Activity ORDER BY Score);
Result:
+---------+-------------+-------+------+------------+-----------------+-------------------------+ | DogName | Activity | Score | Rank | Dense Rank | Percentage Rank | Cumulative Distribution | +---------+-------------+-------+------+------------+-----------------+-------------------------+ | Bruno | Fetch Stick | 43 | 1 | 1 | 0 | 0.3333333333333333 | | Cooper | Fetch Stick | 67 | 2 | 2 | 0.5 | 0.6666666666666666 | | Max | Fetch Stick | 91 | 3 | 3 | 1 | 1 | | Max | Keep Quiet | 1 | 1 | 1 | 0 | 0.3333333333333333 | | Bruno | Keep Quiet | 2 | 2 | 2 | 0.5 | 0.6666666666666666 | | Cooper | Keep Quiet | 8 | 3 | 3 | 1 | 1 | | Cooper | Wag Tail | 51 | 1 | 1 | 0 | 0.3333333333333333 | | Bruno | Wag Tail | 65 | 2 | 2 | 0.5 | 0.6666666666666666 | | Max | Wag Tail | 87 | 3 | 3 | 1 | 1 | +---------+-------------+-------+------+------------+-----------------+-------------------------+ 9 rows in set (0.00 sec)
As with any SQL functionality, the availability and exact syntax of each function will depend on your DBMS. Best to check with your DBMS’s documentation for a complete overview of the syntax.