Many relational database management systems (RDBMSs) have a LEAD()
function that allows us to retrieve a value from a following row.
The SQL LEAD()
function returns the value of a given expression from the row that leads (follows) the current row by a given number of rows within its partition.
Example
Here’s an example to demonstrate:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 1 )
OVER ( ORDER BY Score )
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | 2 | | Bruno | Keep Quiet | 2 | 8 | | Cooper | Keep Quiet | 8 | 43 | | Bruno | Fetch Stick | 43 | 51 | | Cooper | Wag Tail | 51 | 65 | | Bruno | Wag Tail | 65 | 67 | | Cooper | Fetch Stick | 67 | 87 | | Max | Wag Tail | 87 | 91 | | Max | Fetch Stick | 91 | NULL | +---------+-------------+-------+------+
Here, I’m using the SQL LEAD()
function to return the score from the next row.
The first argument specifies the column and the second argument specifies the lead amount. My first argument to the function is Score
, and the second argument is 1
. Therefore in this case, the value comes from the next row’s Score
column. It’s the next row because 1
means it goes forward one row.
Many/most RDBMSs provide a default lead value of 1
, meaning that if we don’t provide the second argument, it will use 1
. Therefore, it uses the next row by default.
We can see that the last row contains NULL
in the Lead
column. That’s because there are no more rows for which to retrieve a value. If there are no more rows, then there are no more values to retrieve, and so we get NULL
. Some DBMSs allow us to specify a default value to use instead of NULL
(example later).
Change the Lead Value
We can change the lead value to go forward more than one row:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 3 )
OVER ( ORDER BY Score )
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | 43 | | Bruno | Keep Quiet | 2 | 51 | | Cooper | Keep Quiet | 8 | 65 | | Bruno | Fetch Stick | 43 | 67 | | Cooper | Wag Tail | 51 | 87 | | Bruno | Wag Tail | 65 | 91 | | Cooper | Fetch Stick | 67 | NULL | | Max | Wag Tail | 87 | NULL | | Max | Fetch Stick | 91 | NULL | +---------+-------------+-------+------+
This time I specified 3
as the lead value. This resulted in the Lead
column returning the score from three rows forward. It also resulted in the last three rows being NULL
, as expected.
Specifying a Default Value
Some RDBMSs allow us to change the default value to be something other than NULL
. For example, in MySQL, we can provide this as a third argument:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 3, 'N/A' )
OVER ( ORDER BY Score )
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | 43 | | Bruno | Keep Quiet | 2 | 51 | | Cooper | Keep Quiet | 8 | 65 | | Bruno | Fetch Stick | 43 | 67 | | Cooper | Wag Tail | 51 | 87 | | Bruno | Wag Tail | 65 | 91 | | Cooper | Fetch Stick | 67 | N/A | | Max | Wag Tail | 87 | N/A | | Max | Fetch Stick | 91 | N/A | +---------+-------------+-------+------+
I passed the string N/A
as the third argument, which means that it is now the default value. In other words, N/A
will be returned whenever there’s no value.
Be aware that some RDBMSs (such as SQL Server) only allow us to use a default value that is type-compatible with the column specified in the first argument.
Running the above example in SQL Server produces the following error:
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value 'N/A' to data type int.
The exact error may depend on the data types involved.
Anyway, in SQL Server, we would need to change our default value to be type-compatible with the column specified by the first argument. For example:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 3, 0 )
OVER ( ORDER BY Score )
AS "Lead"
FROM Dogs;
Result:
DogName Activity Score Lead ------- ----------- ----- ---- Max Keep Quiet 1 43 Bruno Keep Quiet 2 51 Cooper Keep Quiet 8 65 Bruno Fetch Stick 43 67 Cooper Wag Tail 51 87 Bruno Wag Tail 65 91 Cooper Fetch Stick 67 0 Max Wag Tail 87 0 Max Fetch Stick 91 0
Partitioning
We can partition the row set by a given column. This is common when using window functions in SQL. It allows us to apply the function to each partition, without other partitions affecting the result.
To do this, we use the PARTITION BY
clause:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | 67 | | Cooper | Fetch Stick | 67 | 91 | | Max | Fetch Stick | 91 | NULL | | Max | Keep Quiet | 1 | 2 | | Bruno | Keep Quiet | 2 | 8 | | Cooper | Keep Quiet | 8 | NULL | | Cooper | Wag Tail | 51 | 65 | | Bruno | Wag Tail | 65 | 87 | | Max | Wag Tail | 87 | NULL | +---------+-------------+-------+------+
This time our LEAD()
function is applied to each partition. Only the scores from the current partition are considered, and so we get NULL
at the end of each partition.
Changing the Order
Here’s what happens when we change the order to descending order:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score DESC
)
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Max | Fetch Stick | 91 | 67 | | Cooper | Fetch Stick | 67 | 43 | | Bruno | Fetch Stick | 43 | NULL | | Cooper | Keep Quiet | 8 | 2 | | Bruno | Keep Quiet | 2 | 1 | | Max | Keep Quiet | 1 | NULL | | Max | Wag Tail | 87 | 65 | | Bruno | Wag Tail | 65 | 51 | | Cooper | Wag Tail | 51 | NULL | +---------+-------------+-------+------+
Omitting the ORDER BY
Clause
Some RDBMSs require that the OVER
clause includes an ORDER BY
clause. Removing the ORDER BY
clause in such RDBMSs results in an error.
Here’s what happens when I remove the ORDER BY
clause in SQL Server:
SELECT
DogName,
Activity,
Score,
LEAD( Score, 1 )
OVER (
PARTITION BY Activity
)
AS "Lead"
FROM Dogs;
Result:
Msg 4112, Level 15, State 1, Line 5 The function 'LEAD' must have an OVER clause with ORDER BY.
But other RDBMSs don’t have such a restriction. For example, running the above code in MySQL produces the following:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | 67 | | Cooper | Fetch Stick | 67 | 91 | | Max | Fetch Stick | 91 | NULL | | Bruno | Keep Quiet | 2 | 8 | | Cooper | Keep Quiet | 8 | 1 | | Max | Keep Quiet | 1 | NULL | | Bruno | Wag Tail | 65 | 51 | | Cooper | Wag Tail | 51 | 87 | | Max | Wag Tail | 87 | NULL | +---------+-------------+-------+------+
No error.
If you’re getting an error when omitting the ORDER BY
clause, check the documentation for your RDBMS. It may be that it requires the ORDER BY
clause.
Sorting by a Different Column
In the previous examples, the column that we pass to the LEAD()
function is the same one we pass to the ORDER BY
clause, but we don’t necessarily have to do this. We can sort by a different column to the one we’re returning.
Example:
SELECT
DogName,
Activity,
Score,
LEAD( DogName, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lead"
FROM Dogs;
Result:
+---------+-------------+-------+--------+ | DogName | Activity | Score | Lead | +---------+-------------+-------+--------+ | Bruno | Fetch Stick | 43 | Cooper | | Cooper | Fetch Stick | 67 | Max | | Max | Fetch Stick | 91 | NULL | | Max | Keep Quiet | 1 | Bruno | | Bruno | Keep Quiet | 2 | Cooper | | Cooper | Keep Quiet | 8 | NULL | | Cooper | Wag Tail | 51 | Bruno | | Bruno | Wag Tail | 65 | Max | | Max | Wag Tail | 87 | NULL | +---------+-------------+-------+--------+
This time we returned the dog names instead of the scores, but we still sorted by the scores. So we still get a value from the following row, just from a different column.
DBMSs that include the LEAD()
function also include a LAG()
function that returns the value from the other direction (it returns the value from a previous row).