Understanding the NTH_VALUE() Function in SQL

Some SQL databases have a window function called NTH_VALUE() that allows us to get a value from a given row in the window frame, based on the row number.

More specifically, the function returns the value of a given expression from the from the N-th row of the window frame, where N is a number that we specify when calling the function.

Example

Here’s a basic example to demonstrate how NTH_VALUE() works in SQL:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 4 ) 
        OVER ( ORDER BY Score ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Max     | Keep Quiet  |     1 |      NULL |
| Bruno   | Keep Quiet  |     2 |      NULL |
| Cooper  | Keep Quiet  |     8 |      NULL |
| Bruno   | Fetch Stick |    43 |        43 |
| Cooper  | Wag Tail    |    51 |        43 |
| Bruno   | Wag Tail    |    65 |        43 |
| Cooper  | Fetch Stick |    67 |        43 |
| Max     | Wag Tail    |    87 |        43 |
| Max     | Fetch Stick |    91 |        43 |
+---------+-------------+-------+-----------+

In this example, I got the score from the fourth row. That’s because the first argument is Score, and the second is 4.

We can verify this by looking at the fourth row in the Score column. Sure enough, its value is 43, and that’s what’s returned by NTH_VALUE().

I didn’t specify any partitioning and so we got the same value on every row (apart from the NULL values).

Changing the Window Frame

The previous example contains NULL in the first three rows. This is due to the default window frame.

I ran this example in MySQL, and its default window frame when using an ORDER BY clause is equivalent to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This window frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause). But it doesn’t include rows after the current row.

Therefore, when using the default window frame, the NTH_VALUE() will only be able to return data that is either in the current row or a previous row. As the query progresses through the rows, the “current row” will be progressively further along the result set, and it will eventually be able to return the value from the specified row.

Because of this, NTH_VALUE() will return NULL in rows earlier than the row specified at the second argument if we’re using the default window frame.

But we aren’t forced to use the default window frame. We can expand the window frame if we want.

Here’s an example of expanding the window frame:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 4 ) 
        OVER ( 
            ORDER BY Score 
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
            )
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Max     | Keep Quiet  |     1 |        43 |
| Bruno   | Keep Quiet  |     2 |        43 |
| Cooper  | Keep Quiet  |     8 |        43 |
| Bruno   | Fetch Stick |    43 |        43 |
| Cooper  | Wag Tail    |    51 |        43 |
| Bruno   | Wag Tail    |    65 |        43 |
| Cooper  | Fetch Stick |    67 |        43 |
| Max     | Wag Tail    |    87 |        43 |
| Max     | Fetch Stick |    91 |        43 |
+---------+-------------+-------+-----------+

This time I specified a window frame of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

We can see that there are no NULL values, and the first three rows now contain the same value that the other rows contain.

Partitioning

We can use a PARTITION BY clause to group the row set into partitions:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 2 ) 
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Bruno   | Fetch Stick |    43 |      NULL |
| Cooper  | Fetch Stick |    67 |        67 |
| Max     | Fetch Stick |    91 |        67 |
| Max     | Keep Quiet  |     1 |      NULL |
| Bruno   | Keep Quiet  |     2 |         2 |
| Cooper  | Keep Quiet  |     8 |         2 |
| Cooper  | Wag Tail    |    51 |      NULL |
| Bruno   | Wag Tail    |    65 |        65 |
| Max     | Wag Tail    |    87 |        65 |
+---------+-------------+-------+-----------+

This causes the NTH_VALUE() function to operate within the bounds of each partition. So in our case, its value comes from the second row in each partition.

The FROM FIRST/FROM LAST Arguments

The SQL standard allows for an optional FROM FIRST or FROM LAST argument to be used to specify from which end of the partition NTH_VALUE() should start its calculations.

Not all RDBMSs support this feature at the time of writing, and amongst those that do, they might only have limited support. For example, MySQL 8 supports FROM FIRST but not FROM LAST.

As it turns out, FROM FIRST is the default option, and so omitting the argument altogether will yield the same results.

But here’s an example anyway:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 2 ) FROM FIRST
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Bruno   | Fetch Stick |    43 |      NULL |
| Cooper  | Fetch Stick |    67 |        67 |
| Max     | Fetch Stick |    91 |        67 |
| Max     | Keep Quiet  |     1 |      NULL |
| Bruno   | Keep Quiet  |     2 |         2 |
| Cooper  | Keep Quiet  |     8 |         2 |
| Cooper  | Wag Tail    |    51 |      NULL |
| Bruno   | Wag Tail    |    65 |        65 |
| Max     | Wag Tail    |    87 |        65 |
+---------+-------------+-------+-----------+

As mentioned, just because an RDBMS supports the FROM FIRST argument, it doesn’t necessarily mean that it will also support the FROM LAST argument. Here’s what happens when I use the FROM LAST argument in the same MySQL implementation that I ran the previous example in:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 3 ) FROM LAST
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

ERROR 1235 (42000): This version of MySQL doesn't yet support 'FROM LAST'

However, all is not lost. We can get the same effect by using ORDER BY to sort in reverse order.

Dealing with NULL Values

The SQL standard also has an option for dealing with NULL values. By this I mean, it allows us to specify whether or not to respect NULL values or ignore them. This is specified with the RESPECT NULLS and IGNORE NULLS arguments.

As with the FROM FIRST/FROM LAST arguments in the previous example, not all RDBMSs support this option, and with those that do, support can be limited. MySQL 8 for example, supports RESPECT NULLS but not IGNORE NULLS. RESPECT NULLS is the default, and so we get the same result with or without this argument.

Let’s explicitly specify RESPECT NULLS:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 2 ) RESPECT NULLS
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Bruno   | Fetch Stick |    43 |      NULL |
| Cooper  | Fetch Stick |    67 |        67 |
| Max     | Fetch Stick |    91 |        67 |
| Max     | Keep Quiet  |     1 |      NULL |
| Bruno   | Keep Quiet  |     2 |         2 |
| Cooper  | Keep Quiet  |     8 |         2 |
| Cooper  | Wag Tail    |    51 |      NULL |
| Bruno   | Wag Tail    |    65 |        65 |
| Max     | Wag Tail    |    87 |        65 |
+---------+-------------+-------+-----------+

Specifying IGNORE NULLS in the same RDBMS produces an error (at least as of MySQL 8.0.33):

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 2 ) IGNORE NULLS
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

ERROR 1235 (42000): This version of MySQL doesn't yet support 'IGNORE NULLS'

Changing the Order

Here’s what happens when we change the order to descending order:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( Score, 2 )
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score DESC
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Max     | Fetch Stick |    91 |      NULL |
| Cooper  | Fetch Stick |    67 |        67 |
| Bruno   | Fetch Stick |    43 |        67 |
| Cooper  | Keep Quiet  |     8 |      NULL |
| Bruno   | Keep Quiet  |     2 |         2 |
| Max     | Keep Quiet  |     1 |         2 |
| Max     | Wag Tail    |    87 |      NULL |
| Bruno   | Wag Tail    |    65 |        65 |
| Cooper  | Wag Tail    |    51 |        65 |
+---------+-------------+-------+-----------+

This is basically an alternative to using the FROM LAST argument (which currently has limited support among RDBMSs).

Returning a Different Column

In the above examples, the column that we pass to the NTH_VALUE() function is the same one we pass to the ORDER BY clause. But that’s not a requirement. We can pass different a different column if needed.

Example:

SELECT 
    DogName,
    Activity,
    Score,
    NTH_VALUE( DogName, 2 )
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score
            ) 
    AS "NTH Value"
FROM Dogs;

Result:

+---------+-------------+-------+-----------+
| DogName | Activity    | Score | NTH Value |
+---------+-------------+-------+-----------+
| Bruno   | Fetch Stick |    43 | NULL      |
| Cooper  | Fetch Stick |    67 | Cooper    |
| Max     | Fetch Stick |    91 | Cooper    |
| Max     | Keep Quiet  |     1 | NULL      |
| Bruno   | Keep Quiet  |     2 | Bruno     |
| Cooper  | Keep Quiet  |     8 | Bruno     |
| Cooper  | Wag Tail    |    51 | NULL      |
| Bruno   | Wag Tail    |    65 | Bruno     |
| Max     | Wag Tail    |    87 | Bruno     |
+---------+-------------+-------+-----------+

In this example I passed the DogName column to the NTH_VALUE() function, but I sorted by Score. The result is that our NTH Value column contains the dog names instead of their scores. But the ordering of the window function was done using the scores.