A Comparison of 6 SQL Rank Functions

SQL rank functions can be handy things to have when analysing data. Most major RDBMSs implement a similar bunch of ranking functions, usually with the same names. These rank functions allow us to assign rankings to rows based on specific criteria.

In this article, we’ll look at six commonly used SQL ranking functions, and observe how they differ. We’ll throw them all together into a single query and see their results side by side.

The rank functions in question are: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), PERCENT_RANK(), and CUME_DIST().

Example Query

Here’s a query that uses all six SQL rank functions:

SELECT 
    id,
    sales,
    customer_satisfaction AS cs,
    ROW_NUMBER() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS row_num,
    RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS dense_rank,
    NTILE(4) OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS ntile,
    PERCENT_RANK() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS percent_rank,
    CUME_DIST() OVER (ORDER BY sales DESC, customer_satisfaction DESC) AS cume_dist
FROM employee_performance
ORDER BY sales DESC, customer_satisfaction DESC;

Result:

id  sales   cs   row_num  rank  dense_rank  ntile  percent_rank         cume_dist
--  ------  ---  -------  ----  ----------  -----  -------------------  ---------
2   150000  4.5  1        1     1           1      0                    0.0625   
16  142000  4.7  2        2     2           1      0.06666666666666667  0.125    
9   140000  4.6  3        3     3           1      0.13333333333333333  0.1875   
14  140000  4.5  4        4     4           1      0.2                  0.25     
5   140000  4.4  5        5     5           2      0.26666666666666666  0.3125   
15  137000  4.4  6        6     6           2      0.3333333333333333   0.375    
8   135000  4.2  7        7     7           2      0.4                  0.4375   
13  130000  4.3  8        8     8           2      0.4666666666666667   0.5625   
4   130000  4.3  9        8     8           3      0.4666666666666667   0.5625   
12  130000  4.1  10       10    9           3      0.6                  0.625    
6   125000  4.1  11       11    10          3      0.6666666666666666   0.6875   
10  122000  4    12       12    11          3      0.7333333333333333   0.75     
1   120000  4.2  13       13    12          4      0.8                  0.8125   
11  118000  3.8  14       14    13          4      0.8666666666666667   0.875    
7   115000  3.9  15       15    14          4      0.9333333333333333   0.9375   
3   110000  4    16       16    15          4      1                    1

Explanation of these SQL Rank Functions

Here’s a brief overview of each ranking function in the above query:

ROW_NUMBER() assigns a unique sequential integer to each row within a partition of a result set. It starts with 1 for the first row in each partition and increments by 1 for each subsequent row.
RANK() assigns a rank to each row within a partition of a result set. It leaves gaps in the ranking when there are ties. For example, if two rows tie for first place, the next rank would be 3.
DENSE_RANK() is similar to RANK(), but it doesn’t leave gaps in the ranking when there are ties. In the case of a tie, the next rank would be the next consecutive integer. For example, if two rows tie for first place, the next rank would be 2.
NTILE(n) divides the rows in an ordered partition into a specified number of groups, n. It assigns an integer value to each group, starting from 1. This function is useful for creating percentiles or dividing data into equal-sized buckets.
PERCENT_RANK() calculates the relative rank of a row within a group of values. It returns a value between 0 and 1, representing the percentage of rows with a value less than the current row.
CUME_DIST() calculates the cumulative distribution of a value within a group of values. It returns a value between 0 and 1, representing the percentage of rows with a value less than or equal to the current row.

By comparing the results of these functions side by side, we can see how they differ in their approach to ranking the data.

Understanding each of these SQL rank functions and their unique characteristics allows us to choose the most appropriate function for our needs.