SQL rank functions can be handy things to have when analysing data. Most major RDBMSs implement a similar bunch of ranking functions, usually with the same names. These rank functions allow us to assign rankings to rows based on specific criteria.
In this article, we’ll look at six commonly used SQL ranking functions, and observe how they differ. We’ll throw them all together into a single query and see their results side by side.
The rank functions in question are: ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, PERCENT_RANK()
, and CUME_DIST()
.
Example Query
Here’s a query that uses all six SQL rank functions:
SELECT
id,
sales,
customer_satisfaction AS cs,
ROW_NUMBER() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS row_num,
RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS rank,
DENSE_RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS dense_rank,
NTILE(4) OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS ntile,
PERCENT_RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS percent_rank,
CUME_DIST() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS cume_dist
FROM employee_performance
ORDER BY sales DESC, customer_satisfaction DESC;
Result:
id sales cs row_num rank dense_rank ntile percent_rank cume_dist
-- ------ --- ------- ---- ---------- ----- ------------------- ---------
2 150000 4.5 1 1 1 1 0 0.0625
16 142000 4.7 2 2 2 1 0.06666666666666667 0.125
9 140000 4.6 3 3 3 1 0.13333333333333333 0.1875
14 140000 4.5 4 4 4 1 0.2 0.25
5 140000 4.4 5 5 5 2 0.26666666666666666 0.3125
15 137000 4.4 6 6 6 2 0.3333333333333333 0.375
8 135000 4.2 7 7 7 2 0.4 0.4375
13 130000 4.3 8 8 8 2 0.4666666666666667 0.5625
4 130000 4.3 9 8 8 3 0.4666666666666667 0.5625
12 130000 4.1 10 10 9 3 0.6 0.625
6 125000 4.1 11 11 10 3 0.6666666666666666 0.6875
10 122000 4 12 12 11 3 0.7333333333333333 0.75
1 120000 4.2 13 13 12 4 0.8 0.8125
11 118000 3.8 14 14 13 4 0.8666666666666667 0.875
7 115000 3.9 15 15 14 4 0.9333333333333333 0.9375
3 110000 4 16 16 15 4 1 1
Explanation of these SQL Rank Functions
Here’s a brief overview of each ranking function in the above query:
ROW_NUMBER()
assigns a unique sequential integer to each row within a partition of a result set. It starts with 1 for the first row in each partition and increments by 1 for each subsequent row.RANK()
assigns a rank to each row within a partition of a result set. It leaves gaps in the ranking when there are ties. For example, if two rows tie for first place, the next rank would be 3.DENSE_RANK()
is similar toRANK()
, but it doesn’t leave gaps in the ranking when there are ties. In the case of a tie, the next rank would be the next consecutive integer. For example, if two rows tie for first place, the next rank would be 2.NTILE(n)
divides the rows in an ordered partition into a specified number of groups, n. It assigns an integer value to each group, starting from 1. This function is useful for creating percentiles or dividing data into equal-sized buckets.PERCENT_RANK()
calculates the relative rank of a row within a group of values. It returns a value between 0 and 1, representing the percentage of rows with a value less than the current row.CUME_DIST()
calculates the cumulative distribution of a value within a group of values. It returns a value between 0 and 1, representing the percentage of rows with a value less than or equal to the current row.
By comparing the results of these functions side by side, we can see how they differ in their approach to ranking the data.
Understanding each of these SQL rank functions and their unique characteristics allows us to choose the most appropriate function for our needs.