In MySQL, the LAG()
function is a window function that returns the value of a given expression from the row that lags (precedes) the current row by a given number of rows within its partition.
Basically, it returns the value from a previous row.
Syntax
The syntax goes like this:
LAG(expr [, N[, default]]) [null_treatment] over_clause
So it requires an OVER
clause, which determines how the rowset is partitioned and ordered before the window function is applied.
The optional N
argument specifies the lag value. For example, 3
means three rows back.
The optional default
argument specifies what to return if there’s no value (this will happen when the current row number is less than the lag).
The optional null_treatment
clause specifies how NULL
values are treated. This clause is part of the SQL standard, but MySQL only allows RESPECT NULLS
, which is also the default. RESPECT NULLS
means that NULL
values are considered when calculating results. On the other hand, IGNORE NULLS
is parsed, but produces an error (at least as of MySQL 8.0.33).
Example
Here’s a basic example to demonstrate how it works:
SELECT
DogName,
Activity,
Score,
LAG( Score )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Keep Quiet | 1 | NULL | | Cooper | Keep Quiet | 8 | 1 | | Max | Keep Quiet | 12 | 8 | | Bruno | Fetch Stick | 43 | 12 | | Bruno | Wag Tail | 51 | 43 | | Cooper | Wag Tail | 51 | 51 | | Cooper | Fetch Stick | 67 | 51 | | Max | Wag Tail | 87 | 67 | | Max | Fetch Stick | 91 | 87 | +---------+-------------+-------+------+
We can see that the Lag
column lags the Score
column by one row, and the first row is NULL
. That’s because I didn’t provide any arguments other than the expression/column to use. It therefore used the default lag value of 1
and a default value of NULL
.
Change the Lag
Let’s change the lag value:
SELECT
DogName,
Activity,
Score,
LAG( Score, 3 )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Keep Quiet | 1 | NULL | | Cooper | Keep Quiet | 8 | NULL | | Max | Keep Quiet | 12 | NULL | | Bruno | Fetch Stick | 43 | 1 | | Bruno | Wag Tail | 51 | 8 | | Cooper | Wag Tail | 51 | 12 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 51 | | Max | Fetch Stick | 91 | 51 | +---------+-------------+-------+------+
This time I specified 3
as the lag value. This resulted in the first three rows being NULL
, and all subsequent rows returning the score from three rows back.
Specify a Default Value
We can change the default value to be something other than NULL
. Here’s an example:
SELECT
DogName,
Activity,
Score,
LAG( Score, 3, 'N/A' )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Keep Quiet | 1 | N/A | | Cooper | Keep Quiet | 8 | N/A | | Max | Keep Quiet | 12 | N/A | | Bruno | Fetch Stick | 43 | 1 | | Bruno | Wag Tail | 51 | 8 | | Cooper | Wag Tail | 51 | 12 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 51 | | Max | Fetch Stick | 91 | 51 | +---------+-------------+-------+------+
This time I specified that the string N/A
be returned whenever there’s no value.
Partitioning
This example uses a PARTITION BY
clause to partition the results by activity:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 'N/A' )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | N/A | | Cooper | Fetch Stick | 67 | 43 | | Max | Fetch Stick | 91 | 67 | | Bruno | Keep Quiet | 1 | N/A | | Cooper | Keep Quiet | 8 | 1 | | Max | Keep Quiet | 12 | 8 | | Bruno | Wag Tail | 51 | N/A | | Cooper | Wag Tail | 51 | 51 | | Max | Wag Tail | 87 | 51 | +---------+-------------+-------+------+
This has caused the lag to reset with each partition.
The null_treatment
Clause
As mentioned, the optional null_treatment
clause specifies how to treat NULL
values. The default value for this clause is RESPECT NULLS
. Therefore, the above example implicitly uses RESPECT NULLS
.
Let’s explicitly specify RESPECT NULLS
:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 'N/A' ) RESPECT NULLS
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | N/A | | Cooper | Fetch Stick | 67 | 43 | | Max | Fetch Stick | 91 | 67 | | Bruno | Keep Quiet | 1 | N/A | | Cooper | Keep Quiet | 8 | 1 | | Max | Keep Quiet | 12 | 8 | | Bruno | Wag Tail | 51 | N/A | | Cooper | Wag Tail | 51 | 51 | | Max | Wag Tail | 87 | 51 | +---------+-------------+-------+------+
Specifying IGNORE NULLS
produces an error (at least as of MySQL 8.0.33):
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 'N/A' ) IGNORE NULLS
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'IGNORE NULLS'
Changing the Order
Here’s what happens when we change the order to descending order:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 'N/A' )
OVER (
PARTITION BY Activity
ORDER BY Score DESC
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Max | Fetch Stick | 91 | N/A | | Cooper | Fetch Stick | 67 | 91 | | Bruno | Fetch Stick | 43 | 67 | | Max | Keep Quiet | 12 | N/A | | Cooper | Keep Quiet | 8 | 12 | | Bruno | Keep Quiet | 1 | 8 | | Max | Wag Tail | 87 | N/A | | Bruno | Wag Tail | 51 | 87 | | Cooper | Wag Tail | 51 | 51 | +---------+-------------+-------+------+
Passing a Different Column to LAG()
In the above examples, the column that we pass to the LAG()
function is the same one we pass to the ORDER BY
clause. But that’s not a requirement. We can pass a different column if required.
Here’s an example that uses two different columns for LAG()
and the ORDER BY
clause:
SELECT
DogName,
Activity,
Score,
LAG( DogName, 1, 'N/A' )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+--------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+--------+ | Bruno | Fetch Stick | 43 | N/A | | Cooper | Fetch Stick | 67 | Bruno | | Max | Fetch Stick | 91 | Cooper | | Bruno | Keep Quiet | 1 | N/A | | Cooper | Keep Quiet | 8 | Bruno | | Max | Keep Quiet | 12 | Cooper | | Bruno | Wag Tail | 51 | N/A | | Cooper | Wag Tail | 51 | Bruno | | Max | Wag Tail | 87 | Cooper | +---------+-------------+-------+--------+
Here, we passed the DogName
column to the LAG()
function, but we sorted by the Score
column. This resulted in the dog names being displayed in the Lag
column.
There’s also a LEAD()
function that returns the value from the other direction (i.e. it returns the value from the row that follows the current row by the given amount).