Many SQL databases have a window function called NTILE()
function that divides a rowset or partition into a given number of groups (buckets). The function typically returns the bucket number of the current row within its partition.
Example
Here’s a basic example to demonstrate how the NTILE()
function works in SQL:
SELECT
DogName,
Activity,
Score,
NTILE( 4 )
OVER ( ORDER BY Score )
AS "NTILE"
FROM Dogs;
Result:
+---------+-------------+-------+-------+ | DogName | Activity | Score | NTILE | +---------+-------------+-------+-------+ | Max | Keep Quiet | 1 | 1 | | Bruno | Keep Quiet | 2 | 1 | | Cooper | Keep Quiet | 8 | 1 | | Bruno | Fetch Stick | 43 | 2 | | Cooper | Wag Tail | 51 | 2 | | Bruno | Wag Tail | 65 | 3 | | Cooper | Fetch Stick | 67 | 3 | | Max | Wag Tail | 87 | 4 | | Max | Fetch Stick | 91 | 4 | +---------+-------------+-------+-------+
I specified 4
as the number of buckets, and I used an OVER
clause with an ORDER BY
clause to specify the ordering.
I didn’t apply any partitioning in this example, and so the number of buckets is applied to the whole rowset.
The OVER
clause is pretty much a requirement with SQL window functions. The ORDER BY
clause may be required or it may be optional, depending on your DBMS.
Partitioning
As mentioned, the above example doesn’t use partitions – the NTILE()
function is applied to the whole rowset.
In SQL we use the PARTITION BY
clause to create partitions.
This example uses partitions:
SELECT
DogName,
Activity,
Score,
NTILE( 2 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTILE"
FROM Dogs;
Result:
+---------+-------------+-------+-------+ | DogName | Activity | Score | NTILE | +---------+-------------+-------+-------+ | Bruno | Fetch Stick | 43 | 1 | | Cooper | Fetch Stick | 67 | 1 | | Max | Fetch Stick | 91 | 2 | | Max | Keep Quiet | 1 | 1 | | Bruno | Keep Quiet | 2 | 1 | | Cooper | Keep Quiet | 8 | 2 | | Cooper | Wag Tail | 51 | 1 | | Bruno | Wag Tail | 65 | 1 | | Max | Wag Tail | 87 | 2 | +---------+-------------+-------+-------+
In this case I partitioned the results by activity. There are three activities, and so the results are partitioned into three partitions – one partition per activity.
Changing the Order
Here’s what happens when we change the order to descending order:
SELECT
DogName,
Activity,
Score,
NTILE( 2 )
OVER (
PARTITION BY Activity
ORDER BY Score DESC
)
AS "NTILE"
FROM Dogs;
Result:
+---------+-------------+-------+-------+ | DogName | Activity | Score | NTILE | +---------+-------------+-------+-------+ | Max | Fetch Stick | 91 | 1 | | Cooper | Fetch Stick | 67 | 1 | | Bruno | Fetch Stick | 43 | 2 | | Cooper | Keep Quiet | 8 | 1 | | Bruno | Keep Quiet | 2 | 1 | | Max | Keep Quiet | 1 | 2 | | Max | Wag Tail | 87 | 1 | | Bruno | Wag Tail | 65 | 1 | | Cooper | Wag Tail | 51 | 2 | +---------+-------------+-------+-------+
We still get two buckets, but some rows are assigned different bucket numbers due to the different ordering.
Omitting the ORDER BY
Clause
Some DBMSs allow us to omit the ORDER BY
clause and some don’t.
For example, MySQL allows us to omit it:
SELECT
DogName,
Activity,
Score,
NTILE( 4 )
OVER ( )
AS "NTILE"
FROM Dogs;
Result:
+---------+-------------+-------+-------+ | DogName | Activity | Score | NTILE | +---------+-------------+-------+-------+ | Bruno | Fetch Stick | 43 | 1 | | Cooper | Fetch Stick | 67 | 1 | | Max | Fetch Stick | 91 | 1 | | Bruno | Wag Tail | 65 | 2 | | Cooper | Wag Tail | 51 | 2 | | Max | Wag Tail | 87 | 3 | | Bruno | Keep Quiet | 2 | 3 | | Cooper | Keep Quiet | 8 | 4 | | Max | Keep Quiet | 1 | 4 | +---------+-------------+-------+-------+
MySQL returns its results without error.
But SQL Server won’t let us omit it. Here’s the error returned when I run the above code in SQL Server:
Msg 4112, Level 15, State 1, Line 5 The function 'NTILE' must have an OVER clause with ORDER BY.
SQL Server returns the above error in most window functions when we omit the ORDER BY
clause. The easiest way to overcome this is to include an ORDER BY
clause.