Understanding the LAG() Function in SQL

The LAG() function is a window function that’s available in many SQL databases. It returns the value of a given expression from the row that lags (precedes) the current row by a given number of rows within its partition.

In other words, the LAG() function returns a value from a previous row.

Example

This example demonstrates how LAG() works:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 1 ) 
        OVER ( ORDER BY Score ) 
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Max     | Keep Quiet  |     1 | NULL |
| Bruno   | Keep Quiet  |     2 |    1 |
| Cooper  | Keep Quiet  |     8 |    2 |
| Bruno   | Fetch Stick |    43 |    8 |
| Cooper  | Wag Tail    |    51 |   43 |
| Bruno   | Wag Tail    |    65 |   51 |
| Cooper  | Fetch Stick |    67 |   65 |
| Max     | Wag Tail    |    87 |   67 |
| Max     | Fetch Stick |    91 |   87 |
+---------+-------------+-------+------+

In this example, I’m using the SQL LAG() function to return the score from the previous row.

The first argument specifies the column and the second argument specifies the lag. My first argument to the function is Score, and the second argument is 1. Therefore in this case, the value comes from the previous row’s Score column. It’s the previous row because 1 means it goes back one row (to the previous row).

Many/most DBMSs provide a default lag value of 1, meaning that if we don’t specify the lag value, it will use 1. Therefore, the default lag is the previous row.

One thing you may have noticed is that the first row contains a NULL value in the Lag column. That’s because there’s no previous row for which to get a value from. This is to be expected. Some DBMSs allow us to specify a default value to use instead of NULL (example later).

Change the Lag

We can change the lag value to go back further than the previous row:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 3 ) 
        OVER ( ORDER BY Score ) 
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Max     | Keep Quiet  |     1 | NULL |
| Bruno   | Keep Quiet  |     2 | NULL |
| Cooper  | Keep Quiet  |     8 | NULL |
| Bruno   | Fetch Stick |    43 |    1 |
| Cooper  | Wag Tail    |    51 |    2 |
| Bruno   | Wag Tail    |    65 |    8 |
| Cooper  | Fetch Stick |    67 |   43 |
| Max     | Wag Tail    |    87 |   51 |
| Max     | Fetch Stick |    91 |   65 |
+---------+-------------+-------+------+

This time I specified 3 as the lag value. This resulted in the Lag column returning the score from three rows prior. It also resulted in the first three rows being NULL, as expected.

Specify a Default Value

Some DBMSs allow us to change the default value to be something other than NULL. For example, in MySQL, we can provide this as a third argument:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 3, 'N/A' ) 
        OVER ( ORDER BY Score ) 
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Max     | Keep Quiet  |     1 | N/A  |
| Bruno   | Keep Quiet  |     2 | N/A  |
| Cooper  | Keep Quiet  |     8 | N/A  |
| Bruno   | Fetch Stick |    43 | 1    |
| Cooper  | Wag Tail    |    51 | 2    |
| Bruno   | Wag Tail    |    65 | 8    |
| Cooper  | Fetch Stick |    67 | 43   |
| Max     | Wag Tail    |    87 | 51   |
| Max     | Fetch Stick |    91 | 65   |
+---------+-------------+-------+------+

This time I specified that the string N/A be returned whenever there’s no value.

Be aware that some DBMSs (such as SQL Server) only allow us to use a default value that is type-compatible with the column specified in the first argument.

Running the above example in SQL Server produces the following error:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'N/A' to data type int.

So in SQL Server, we would need to change our default value to be type-compatible with the column specified by the first argument. For example:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 3, 0 ) 
        OVER ( ORDER BY Score ) 
    AS "Lag"
FROM Dogs;

Result:

DogName  Activity     Score  Lag
-------  -----------  -----  ---
Max      Keep Quiet   1      0  
Bruno    Keep Quiet   2      0  
Cooper   Keep Quiet   8      0  
Bruno    Fetch Stick  43     1  
Cooper   Wag Tail     51     2  
Bruno    Wag Tail     65     8  
Cooper   Fetch Stick  67     43 
Max      Wag Tail     87     51 
Max      Fetch Stick  91     65 

Partitioning

Often when using window functions, we want to partition the row set by a given column. This allows us to apply the function to each partition, without other partitions affecting the result. To do this, we use the PARTITION BY clause.

Example:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 1 ) 
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score 
            )
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Bruno   | Fetch Stick |    43 | NULL |
| Cooper  | Fetch Stick |    67 |   43 |
| Max     | Fetch Stick |    91 |   67 |
| Max     | Keep Quiet  |     1 | NULL |
| Bruno   | Keep Quiet  |     2 |    1 |
| Cooper  | Keep Quiet  |     8 |    2 |
| Cooper  | Wag Tail    |    51 | NULL |
| Bruno   | Wag Tail    |    65 |   51 |
| Max     | Wag Tail    |    87 |   65 |
+---------+-------------+-------+------+

This time our LAG() function is applied to each partition. Only the scores from the current partition are considered, and so we get a NULL value at the start of each partition.

Changing the Order

Here’s what happens when we change the order to descending order:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 1 ) 
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score DESC
            )
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Max     | Fetch Stick |    91 | NULL |
| Cooper  | Fetch Stick |    67 |   91 |
| Bruno   | Fetch Stick |    43 |   67 |
| Cooper  | Keep Quiet  |     8 | NULL |
| Bruno   | Keep Quiet  |     2 |    8 |
| Max     | Keep Quiet  |     1 |    2 |
| Max     | Wag Tail    |    87 | NULL |
| Bruno   | Wag Tail    |    65 |   87 |
| Cooper  | Wag Tail    |    51 |   65 |
+---------+-------------+-------+------+

About the ORDER BY Clause

Some DBMSs require that the OVER clause includes an ORDER BY clause. Removing the ORDER BY clause in such DBMSs results in an error.

Here’s what happens when I remove the ORDER BY clause in SQL Server:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( Score, 1 ) 
        OVER ( 
            PARTITION BY Activity
            )
    AS "Lag"
FROM Dogs;

Result:

Msg 4112, Level 15, State 1, Line 5
The function 'LAG' must have an OVER clause with ORDER BY.

But here’s what happens when I run the same code in MySQL:

+---------+-------------+-------+------+
| DogName | Activity    | Score | Lag  |
+---------+-------------+-------+------+
| Bruno   | Fetch Stick |    43 | NULL |
| Cooper  | Fetch Stick |    67 |   43 |
| Max     | Fetch Stick |    91 |   67 |
| Bruno   | Keep Quiet  |     2 | NULL |
| Cooper  | Keep Quiet  |     8 |    2 |
| Max     | Keep Quiet  |     1 |    8 |
| Bruno   | Wag Tail    |    65 | NULL |
| Cooper  | Wag Tail    |    51 |   65 |
| Max     | Wag Tail    |    87 |   51 |
+---------+-------------+-------+------+

No error.

If you’re getting an error when removing the ORDER BY clause, check your DBMS’s documentation (not to mention the error message itself). It may be that your DBMS requires the ORDER BY clause.

Sorting by a Different Column

In the previous examples, the column that we pass to the LAG() function is the same one we pass to the ORDER BY clause, however, we don’t necessarily have to do this. We can sort by a different column if required.

Example:

SELECT 
    DogName,
    Activity,
    Score,
    LAG( DogName, 1 ) 
        OVER ( 
            PARTITION BY Activity
            ORDER BY Score
            )
    AS "Lag"
FROM Dogs;

Result:

+---------+-------------+-------+--------+
| DogName | Activity    | Score | Lag    |
+---------+-------------+-------+--------+
| Bruno   | Fetch Stick |    43 | NULL   |
| Cooper  | Fetch Stick |    67 | Bruno  |
| Max     | Fetch Stick |    91 | Cooper |
| Max     | Keep Quiet  |     1 | NULL   |
| Bruno   | Keep Quiet  |     2 | Max    |
| Cooper  | Keep Quiet  |     8 | Bruno  |
| Cooper  | Wag Tail    |    51 | NULL   |
| Bruno   | Wag Tail    |    65 | Cooper |
| Max     | Wag Tail    |    87 | Bruno  |
+---------+-------------+-------+--------+

This time we returned the dog names instead of the scores, but we still sorted by the scores. So we still get a value from the previous row, just from a different column.

DBMSs that include the LAG() function also include a LEAD() function that returns the value from the other direction (it returns the value from a following row).