Most SQL databases have a handful of “ranking” functions that allow us to rank data. By “handful”, I mean there’s a common set of around six SQL rank functions that most of the major RDBMSs appear to support.
SQL rank functions allow us to assign ranks or row numbers to result sets.
And while that may sound like a simple job that could be done with just one or two functions, hopefully the examples below will demonstrate why we can definitely benefit from multiple rank functions in SQL.
SQL rank functions are particularly useful for analytical queries, data analysis, and reporting. In this article, we’ll explore six rank functions that we can use in our SQL queries.
The examples in this article use SQL Server. Some RDBMSs may use a slightly different syntax for certain rank functions, but in general, most of them use the same function names.
The ROW_NUMBER()
Function
The ROW_NUMBER()
function assigns a unique sequential integer to rows within a partition of a result set. It starts with 1
for the first row in each partition and increments with each subsequent row.
Example:
SELECT
ProductName,
Category,
Price,
ROW_NUMBER() OVER (ORDER BY Price DESC) AS PriceRank
FROM Products;
Example result:
ProductName Category Price PriceRank
------------ ----------- ------ ---------
Laptop Electronics 999.99 1
Smartphone Electronics 699.99 2
Desk Chair Furniture 199.99 3
Bookshelf Furniture 149.99 4
Coffee Maker Appliances 79.99 5
The RANK()
Function
The RANK()
function assigns a rank to each row within a partition of a result set:
SELECT
EmployeeName,
Department,
Salary,
RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS SalaryRank
FROM Employees;
Example result:
EmployeeName Department Salary SalaryRank
------------- ---------- ------ ----------
Alice Brown IT 65000 1
Frank Thomas IT 65000 1
Eva Martinez Marketing 60000 1
Jane Smith Marketing 60000 1
Charlie Davis Marketing 58000 3
Henry Ford Marketing 58000 3
David Wilson Sales 55000 1
John Doe Sales 55000 1
Grace Kim Sales 52000 3
Bob Johnson Sales 52000 3
The RANK()
function leaves gaps in the ranking when there are ties as can be seen in this example (when there’s a tie, it bypasses 2
and jumps straight to 3
). If you don’t want such gaps, use DENSE_RANK()
instead.
The DENSE_RANK()
Function
Similar to RANK()
, DENSE_RANK()
assigns a rank to each row within a partition, but as mentioned, DENSE_RANK()
doesn’t leave gaps in the ranking when there are ties:
SELECT
EmployeeName,
Department,
Salary,
DENSE_RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS SalaryDenseRank
FROM Employees;
Example result:
EmployeeName Department Salary SalaryDenseRank
------------- ---------- ------ ---------------
Alice Brown IT 65000 1
Frank Thomas IT 65000 1
Eva Martinez Marketing 60000 1
Jane Smith Marketing 60000 1
Charlie Davis Marketing 58000 2
Henry Ford Marketing 58000 2
David Wilson Sales 55000 1
John Doe Sales 55000 1
Grace Kim Sales 52000 2
Bob Johnson Sales 52000 2
We can see that the ties didn’t create any gaps in the rank. It went from 1
to 2
(it didn’t skip 2
and go to 3
like it did in the RANK()
example).
See RANK()
vs DENSE_RANK()
in SQL: What’s the Difference? for a more detailed explanation of the difference between these two functions.
The NTILE()
Function
NTILE()
divides the rows in an ordered partition into a specified number of groups and assigns a group number to each row:
SELECT
CustomerName,
TotalPurchases,
NTILE(4) OVER (ORDER BY TotalPurchases DESC) AS CustomerQuartile,
NTILE(5) OVER (ORDER BY TotalPurchases DESC) AS CustomerQuintile
FROM Customers;
Example result:
CustomerName TotalPurchases CustomerQuartile CustomerQuintile
-------------- -------------- ---------------- ----------------
MegaStore 100000 1 1
BigRetailer 95000 1 1
Enterprise Inc 85000 1 1
TechGiants 80000 1 2
GlobalTech 75000 2 2
B2B Solutions 70000 2 2
Acme Corp 50000 2 3
MidMarket Co 45000 2 3
LocalChain 40000 3 3
OnlineShop 30000 3 4
StartUp Inc 25000 3 4
StartUp Pro 20000 3 4
Local Shop 15000 4 5
SmallBiz 10000 4 5
Corner Store 5000 4 5
This example demonstrates how passing a different number to NTILE()
affects the output. The CustomerQuartile
column puts them into four groups, whereas the CustomerQuintile
column puts them into five.
The PERCENT_RANK()
Function
PERCENT_RANK()
calculates the relative rank of a row within a group of values. It returns a value between 0
and 1
:
SELECT
ProductName,
SalesAmount,
PERCENT_RANK() OVER (ORDER BY SalesAmount) AS SalesPercentRank
FROM ProductSales;
Example result:
ProductName SalesAmount SalesPercentRank
------------ ----------- ----------------
Coffee Maker 5000 0
Bookshelf 7500 0.25
Desk Chair 10000 0.5
Laptop 25000 0.75
Smartphone 35000 1
The CUME_DIST()
Function
CUME_DIST()
calculates the cumulative distribution of a value within a group of values. It returns a value between 0
and 1
:
SELECT
EmployeeName,
YearsOfService,
CUME_DIST() OVER (ORDER BY YearsOfService) AS ServiceDistribution
FROM Employees;
Example result:
EmployeeName YearsOfService ServiceDistribution
------------- -------------- -------------------
Grace Kim 2 0.1
Bob Johnson 3 0.3
Frank Thomas 3 0.3
Alice Brown 4 0.6
Henry Ford 4 0.6
David Wilson 4 0.6
Eva Martinez 5 0.8
John Doe 5 0.8
Charlie Davis 6 0.9
Jane Smith 7 1
Conclusion
These ranking functions in SQL can be quite powerful tools for data analysis, allowing us to easily rank, segment, and analyse our data in various ways. Each SQL rank function has its own specific purpose, and we could quite easily use multiple rank functions side by side in the same query if need be.
The PERCENT_RANK()
and CUME_DIST()
functions are a bit different to the others, in that they provide relative positions, whereas the others assign discrete integer values to rows.
In any case, by mastering the six rank functions listed above, we’ll have the ability to rank our data in more ways than one.