The LAG()
function is a window function that’s available in many SQL databases. It returns the value of a given expression from the row that lags (precedes) the current row by a given number of rows within its partition.
In other words, the LAG()
function returns a value from a previous row.
Example
This example demonstrates how LAG()
works:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1 )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | 1 | | Cooper | Keep Quiet | 8 | 2 | | Bruno | Fetch Stick | 43 | 8 | | Cooper | Wag Tail | 51 | 43 | | Bruno | Wag Tail | 65 | 51 | | Cooper | Fetch Stick | 67 | 65 | | Max | Wag Tail | 87 | 67 | | Max | Fetch Stick | 91 | 87 | +---------+-------------+-------+------+
In this example, I’m using the SQL LAG()
function to return the score from the previous row.
The first argument specifies the column and the second argument specifies the lag. My first argument to the function is Score
, and the second argument is 1
. Therefore in this case, the value comes from the previous row’s Score
column. It’s the previous row because 1
means it goes back one row (to the previous row).
Many/most DBMSs provide a default lag value of 1
, meaning that if we don’t specify the lag value, it will use 1
. Therefore, the default lag is the previous row.
One thing you may have noticed is that the first row contains a NULL value in the Lag
column. That’s because there’s no previous row for which to get a value from. This is to be expected. Some DBMSs allow us to specify a default value to use instead of NULL
(example later).
Change the Lag
We can change the lag value to go back further than the previous row:
SELECT
DogName,
Activity,
Score,
LAG( Score, 3 )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | NULL | | Cooper | Keep Quiet | 8 | NULL | | Bruno | Fetch Stick | 43 | 1 | | Cooper | Wag Tail | 51 | 2 | | Bruno | Wag Tail | 65 | 8 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 51 | | Max | Fetch Stick | 91 | 65 | +---------+-------------+-------+------+
This time I specified 3
as the lag value. This resulted in the Lag
column returning the score from three rows prior. It also resulted in the first three rows being NULL
, as expected.
Specify a Default Value
Some DBMSs allow us to change the default value to be something other than NULL
. For example, in MySQL, we can provide this as a third argument:
SELECT
DogName,
Activity,
Score,
LAG( Score, 3, 'N/A' )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Max | Keep Quiet | 1 | N/A | | Bruno | Keep Quiet | 2 | N/A | | Cooper | Keep Quiet | 8 | N/A | | Bruno | Fetch Stick | 43 | 1 | | Cooper | Wag Tail | 51 | 2 | | Bruno | Wag Tail | 65 | 8 | | Cooper | Fetch Stick | 67 | 43 | | Max | Wag Tail | 87 | 51 | | Max | Fetch Stick | 91 | 65 | +---------+-------------+-------+------+
This time I specified that the string N/A
be returned whenever there’s no value.
Be aware that some DBMSs (such as SQL Server) only allow us to use a default value that is type-compatible with the column specified in the first argument.
Running the above example in SQL Server produces the following error:
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value 'N/A' to data type int.
So in SQL Server, we would need to change our default value to be type-compatible with the column specified by the first argument. For example:
SELECT
DogName,
Activity,
Score,
LAG( Score, 3, 0 )
OVER ( ORDER BY Score )
AS "Lag"
FROM Dogs;
Result:
DogName Activity Score Lag ------- ----------- ----- --- Max Keep Quiet 1 0 Bruno Keep Quiet 2 0 Cooper Keep Quiet 8 0 Bruno Fetch Stick 43 1 Cooper Wag Tail 51 2 Bruno Wag Tail 65 8 Cooper Fetch Stick 67 43 Max Wag Tail 87 51 Max Fetch Stick 91 65
Partitioning
Often when using window functions, we want to partition the row set by a given column. This allows us to apply the function to each partition, without other partitions affecting the result. To do this, we use the PARTITION BY
clause.
Example:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | 43 | | Max | Fetch Stick | 91 | 67 | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | 1 | | Cooper | Keep Quiet | 8 | 2 | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | 51 | | Max | Wag Tail | 87 | 65 | +---------+-------------+-------+------+
This time our LAG()
function is applied to each partition. Only the scores from the current partition are considered, and so we get a NULL value at the start of each partition.
Changing the Order
Here’s what happens when we change the order to descending order:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score DESC
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Max | Fetch Stick | 91 | NULL | | Cooper | Fetch Stick | 67 | 91 | | Bruno | Fetch Stick | 43 | 67 | | Cooper | Keep Quiet | 8 | NULL | | Bruno | Keep Quiet | 2 | 8 | | Max | Keep Quiet | 1 | 2 | | Max | Wag Tail | 87 | NULL | | Bruno | Wag Tail | 65 | 87 | | Cooper | Wag Tail | 51 | 65 | +---------+-------------+-------+------+
About the ORDER BY
Clause
Some DBMSs require that the OVER
clause includes an ORDER BY
clause. Removing the ORDER BY
clause in such DBMSs results in an error.
Here’s what happens when I remove the ORDER BY
clause in SQL Server:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1 )
OVER (
PARTITION BY Activity
)
AS "Lag"
FROM Dogs;
Result:
Msg 4112, Level 15, State 1, Line 5 The function 'LAG' must have an OVER clause with ORDER BY.
But here’s what happens when I run the same code in MySQL:
+---------+-------------+-------+------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | 43 | | Max | Fetch Stick | 91 | 67 | | Bruno | Keep Quiet | 2 | NULL | | Cooper | Keep Quiet | 8 | 2 | | Max | Keep Quiet | 1 | 8 | | Bruno | Wag Tail | 65 | NULL | | Cooper | Wag Tail | 51 | 65 | | Max | Wag Tail | 87 | 51 | +---------+-------------+-------+------+
No error.
If you’re getting an error when removing the ORDER BY
clause, check your DBMS’s documentation (not to mention the error message itself). It may be that your DBMS requires the ORDER BY
clause.
Sorting by a Different Column
In the previous examples, the column that we pass to the LAG()
function is the same one we pass to the ORDER BY
clause, however, we don’t necessarily have to do this. We can sort by a different column if required.
Example:
SELECT
DogName,
Activity,
Score,
LAG( DogName, 1 )
OVER (
PARTITION BY Activity
ORDER BY Score
)
AS "Lag"
FROM Dogs;
Result:
+---------+-------------+-------+--------+ | DogName | Activity | Score | Lag | +---------+-------------+-------+--------+ | Bruno | Fetch Stick | 43 | NULL | | Cooper | Fetch Stick | 67 | Bruno | | Max | Fetch Stick | 91 | Cooper | | Max | Keep Quiet | 1 | NULL | | Bruno | Keep Quiet | 2 | Max | | Cooper | Keep Quiet | 8 | Bruno | | Cooper | Wag Tail | 51 | NULL | | Bruno | Wag Tail | 65 | Cooper | | Max | Wag Tail | 87 | Bruno | +---------+-------------+-------+--------+
This time we returned the dog names instead of the scores, but we still sorted by the scores. So we still get a value from the previous row, just from a different column.
DBMSs that include the LAG()
function also include a LEAD()
function that returns the value from the other direction (it returns the value from a following row).