Some SQL databases have a window function called NTH_VALUE()
that allows us to get a value from a given row in the window frame, based on the row number.
More specifically, the function returns the value of a given expression from the from the N-th row of the window frame, where N is a number that we specify when calling the function.
Example
Here’s a basic example to demonstrate how NTH_VALUE()
works in SQL:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 4 )
OVER ( ORDER BY Score )
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | NULL | | Cooper | Keep Quiet | 8 | NULL | | Bruno | Fetch Stick | 43 | 43 | | Cooper | Wag Tail | 51 | 43 | | Bruno | Wag Tail | 65 | 43 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 43 | | Max | Fetch Stick | 91 | 43 | +---------+-------------+-------+-----------+
In this example, I got the score from the fourth row. That’s because the first argument is Score
, and the second is 4
.
We can verify this by looking at the fourth row in the Score
column. Sure enough, its value is 43
, and that’s what’s returned by NTH_VALUE()
.
I didn’t specify any partitioning and so we got the same value on every row (apart from the NULL
values).
Changing the Window Frame
The previous example contains NULL
in the first three rows. This is due to the default window frame.
I ran this example in MySQL, and its default window frame when using an ORDER BY
clause is equivalent to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. This window frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY
clause). But it doesn’t include rows after the current row.
Therefore, when using the default window frame, the NTH_VALUE()
will only be able to return data that is either in the current row or a previous row. As the query progresses through the rows, the “current row” will be progressively further along the result set, and it will eventually be able to return the value from the specified row.
Because of this, NTH_VALUE()
will return NULL
in rows earlier than the row specified at the second argument if we’re using the default window frame.
But we aren’t forced to use the default window frame. We can expand the window frame if we want.
Here’s an example of expanding the window frame:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 4 )
OVER (
ORDER BY Score
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Max | Keep Quiet | 1 | 43 | | Bruno | Keep Quiet | 2 | 43 | | Cooper | Keep Quiet | 8 | 43 | | Bruno | Fetch Stick | 43 | 43 | | Cooper | Wag Tail | 51 | 43 | | Bruno | Wag Tail | 65 | 43 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 43 | | Max | Fetch Stick | 91 | 43 | +---------+-------------+-------+-----------+
This time I specified a window frame of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.
We can see that there are no NULL
values, and the first three rows now contain the same value that the other rows contain.
Partitioning
We can use a PARTITION BY
clause to group the row set into partitions:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 2 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | 67 | | Max | Fetch Stick | 91 | 67 | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | 2 | | Cooper | Keep Quiet | 8 | 2 | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | 65 | | Max | Wag Tail | 87 | 65 | +---------+-------------+-------+-----------+
This causes the NTH_VALUE()
function to operate within the bounds of each partition. So in our case, its value comes from the second row in each partition.
The FROM FIRST
/FROM LAST
Arguments
The SQL standard allows for an optional FROM FIRST
or FROM LAST
argument to be used to specify from which end of the partition NTH_VALUE()
should start its calculations.
Not all RDBMSs support this feature at the time of writing, and amongst those that do, they might only have limited support. For example, MySQL 8 supports FROM FIRST
but not FROM LAST
.
As it turns out, FROM FIRST
is the default option, and so omitting the argument altogether will yield the same results.
But here’s an example anyway:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 2 ) FROM FIRST
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | 67 | | Max | Fetch Stick | 91 | 67 | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | 2 | | Cooper | Keep Quiet | 8 | 2 | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | 65 | | Max | Wag Tail | 87 | 65 | +---------+-------------+-------+-----------+
As mentioned, just because an RDBMS supports the FROM FIRST
argument, it doesn’t necessarily mean that it will also support the FROM LAST
argument. Here’s what happens when I use the FROM LAST
argument in the same MySQL implementation that I ran the previous example in:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 3 ) FROM LAST
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'FROM LAST'
However, all is not lost. We can get the same effect by using ORDER BY
to sort in reverse order.
Dealing with NULL Values
The SQL standard also has an option for dealing with NULL values. By this I mean, it allows us to specify whether or not to respect NULL values or ignore them. This is specified with the RESPECT NULLS
and IGNORE NULLS
arguments.
As with the FROM FIRST
/FROM LAST
arguments in the previous example, not all RDBMSs support this option, and with those that do, support can be limited. MySQL 8 for example, supports RESPECT NULLS
but not IGNORE NULLS
. RESPECT NULLS
is the default, and so we get the same result with or without this argument.
Let’s explicitly specify RESPECT NULLS
:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 2 ) RESPECT NULLS
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | 67 | | Max | Fetch Stick | 91 | 67 | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | 2 | | Cooper | Keep Quiet | 8 | 2 | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | 65 | | Max | Wag Tail | 87 | 65 | +---------+-------------+-------+-----------+
Specifying IGNORE NULLS
in the same RDBMS produces an error (at least as of MySQL 8.0.33):
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 2 ) IGNORE NULLS
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'IGNORE NULLS'
Changing the Order
Here’s what happens when we change the order to descending order:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( Score, 2 )
OVER (
PARTITION BY Activity
ORDER BY Score DESC
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Max | Fetch Stick | 91 | NULL | | Cooper | Fetch Stick | 67 | 67 | | Bruno | Fetch Stick | 43 | 67 | | Cooper | Keep Quiet | 8 | NULL | | Bruno | Keep Quiet | 2 | 2 | | Max | Keep Quiet | 1 | 2 | | Max | Wag Tail | 87 | NULL | | Bruno | Wag Tail | 65 | 65 | | Cooper | Wag Tail | 51 | 65 | +---------+-------------+-------+-----------+
This is basically an alternative to using the FROM LAST
argument (which currently has limited support among RDBMSs).
Returning a Different Column
In the above examples, the column that we pass to the NTH_VALUE()
function is the same one we pass to the ORDER BY
clause. But that’s not a requirement. We can pass different a different column if needed.
Example:
SELECT
DogName,
Activity,
Score,
NTH_VALUE( DogName, 2 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "NTH Value"
FROM Dogs;
Result:
+---------+-------------+-------+-----------+ | DogName | Activity | Score | NTH Value | +---------+-------------+-------+-----------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | Cooper | | Max | Fetch Stick | 91 | Cooper | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | Bruno | | Cooper | Keep Quiet | 8 | Bruno | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | Bruno | | Max | Wag Tail | 87 | Bruno | +---------+-------------+-------+-----------+
In this example I passed the DogName
column to the NTH_VALUE()
function, but I sorted by Score
. The result is that our NTH Value
column contains the dog names instead of their scores. But the ordering of the window function was done using the scores.