Boost Your Data Analysis with These SQL Rank Functions

Most SQL databases have a handful of “ranking” functions that allow us to rank data. By “handful”, I mean there’s a common set of around six SQL rank functions that most of the major RDBMSs appear to support.

SQL rank functions allow us to assign ranks or row numbers to result sets.

And while that may sound like a simple job that could be done with just one or two functions, hopefully the examples below will demonstrate why we can definitely benefit from multiple rank functions in SQL.

SQL rank functions are particularly useful for analytical queries, data analysis, and reporting. In this article, we’ll explore six rank functions that we can use in our SQL queries.

The examples in this article use SQL Server. Some RDBMSs may use a slightly different syntax for certain rank functions, but in general, most of them use the same function names.

The `ROW_NUMBER()` Function

The ROW_NUMBER() function assigns a unique sequential integer to rows within a partition of a result set. It starts with 1 for the first row in each partition and increments with each subsequent row.

Example:

SELECT 
    ProductName,
    Category,
    Price,
    ROW_NUMBER() OVER (ORDER BY Price DESC) AS PriceRank
FROM Products;

Example result:

ProductName   Category     Price   PriceRank
------------  -----------  ------  ---------
Laptop        Electronics  999.99  1        
Smartphone    Electronics  699.99  2        
Desk Chair    Furniture    199.99  3        
Bookshelf     Furniture    149.99  4        
Coffee Maker  Appliances   79.99   5

The `RANK()` Function

The RANK() function assigns a rank to each row within a partition of a result set:

SELECT 
    EmployeeName,
    Department,
    Salary,
    RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS SalaryRank
FROM Employees;

Example result:

EmployeeName   Department  Salary  SalaryRank
-------------  ----------  ------  ----------
Alice Brown    IT          65000   1         
Frank Thomas   IT          65000   1         
Eva Martinez   Marketing   60000   1         
Jane Smith     Marketing   60000   1         
Charlie Davis  Marketing   58000   3         
Henry Ford     Marketing   58000   3         
David Wilson   Sales       55000   1         
John Doe       Sales       55000   1         
Grace Kim      Sales       52000   3         
Bob Johnson    Sales       52000   3

The RANK() function leaves gaps in the ranking when there are ties as can be seen in this example (when there’s a tie, it bypasses 2 and jumps straight to 3). If you don’t want such gaps, use DENSE_RANK() instead.

The `DENSE_RANK()` Function

Similar to RANK(), DENSE_RANK() assigns a rank to each row within a partition, but as mentioned, DENSE_RANK() doesn’t leave gaps in the ranking when there are ties:

SELECT 
    EmployeeName,
    Department,
    Salary,
    DENSE_RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS SalaryDenseRank
FROM Employees;

Example result:

EmployeeName   Department  Salary  SalaryDenseRank
-------------  ----------  ------  ---------------
Alice Brown    IT          65000   1              
Frank Thomas   IT          65000   1              
Eva Martinez   Marketing   60000   1              
Jane Smith     Marketing   60000   1              
Charlie Davis  Marketing   58000   2              
Henry Ford     Marketing   58000   2              
David Wilson   Sales       55000   1              
John Doe       Sales       55000   1              
Grace Kim      Sales       52000   2              
Bob Johnson    Sales       52000   2

We can see that the ties didn’t create any gaps in the rank. It went from 1 to 2 (it didn’t skip 2 and go to 3 like it did in the RANK() example).

See RANK() vs DENSE_RANK() in SQL: What’s the Difference? for a more detailed explanation of the difference between these two functions.

The `NTILE()` Function

NTILE() divides the rows in an ordered partition into a specified number of groups and assigns a group number to each row:

SELECT 
    CustomerName,
    TotalPurchases,
    NTILE(4) OVER (ORDER BY TotalPurchases DESC) AS CustomerQuartile,
    NTILE(5) OVER (ORDER BY TotalPurchases DESC) AS CustomerQuintile
FROM Customers;

Example result:

CustomerName    TotalPurchases  CustomerQuartile  CustomerQuintile
--------------  --------------  ----------------  ----------------
MegaStore       100000          1                 1               
BigRetailer     95000           1                 1               
Enterprise Inc  85000           1                 1               
TechGiants      80000           1                 2               
GlobalTech      75000           2                 2               
B2B Solutions   70000           2                 2               
Acme Corp       50000           2                 3               
MidMarket Co    45000           2                 3               
LocalChain      40000           3                 3               
OnlineShop      30000           3                 4               
StartUp Inc     25000           3                 4               
StartUp Pro     20000           3                 4               
Local Shop      15000           4                 5               
SmallBiz        10000           4                 5               
Corner Store    5000            4                 5

This example demonstrates how passing a different number to NTILE() affects the output. The CustomerQuartile column puts them into four groups, whereas the CustomerQuintile column puts them into five.

The `PERCENT_RANK()` Function

PERCENT_RANK() calculates the relative rank of a row within a group of values. It returns a value between 0 and 1:

SELECT 
    ProductName,
    SalesAmount,
    PERCENT_RANK() OVER (ORDER BY SalesAmount) AS SalesPercentRank
FROM ProductSales;

Example result:

ProductName   SalesAmount  SalesPercentRank
------------  -----------  ----------------
Coffee Maker  5000         0               
Bookshelf     7500         0.25            
Desk Chair    10000        0.5             
Laptop        25000        0.75            
Smartphone    35000        1

The `CUME_DIST()` Function

CUME_DIST() calculates the cumulative distribution of a value within a group of values. It returns a value between 0 and 1:

SELECT 
    EmployeeName,
    YearsOfService,
    CUME_DIST() OVER (ORDER BY YearsOfService) AS ServiceDistribution
FROM Employees;

Example result:

EmployeeName   YearsOfService  ServiceDistribution
-------------  --------------  -------------------
Grace Kim      2               0.1                
Bob Johnson    3               0.3                
Frank Thomas   3               0.3                
Alice Brown    4               0.6                
Henry Ford     4               0.6                
David Wilson   4               0.6                
Eva Martinez   5               0.8                
John Doe       5               0.8                
Charlie Davis  6               0.9                
Jane Smith     7               1

Conclusion

These ranking functions in SQL can be quite powerful tools for data analysis, allowing us to easily rank, segment, and analyse our data in various ways. Each SQL rank function has its own specific purpose, and we could quite easily use multiple rank functions side by side in the same query if need be.

The PERCENT_RANK() and CUME_DIST() functions are a bit different to the others, in that they provide relative positions, whereas the others assign discrete integer values to rows.

In any case, by mastering the six rank functions listed above, we’ll have the ability to rank our data in more ways than one.

The ROW_NUMBER() Function

The RANK() Function

The DENSE_RANK() Function

The NTILE() Function

The PERCENT_RANK() Function

The CUME_DIST() Function

Conclusion

The `ROW_NUMBER()` Function

The `RANK()` Function

The `DENSE_RANK()` Function

The `NTILE()` Function

The `PERCENT_RANK()` Function

The `CUME_DIST()` Function