MySQL has a LAG()
function that we can use to return the value of a previous row of our choosing, based on how many rows back it is from the current row.
Armed with that value, we can then calculate the difference between it and the current row’s value.
Example
Here’s an example of calculating the difference between the current row and the previous one:
SELECT
IdiotName,
IQ,
LAG( IQ, 1 ) OVER win AS "Lag Value",
IQ - LAG( IQ, 1 ) OVER win AS "Current Minus Previous",
LAG( IQ, 1 ) OVER win - IQ AS "Previous Minus Current"
FROM Idiots
WINDOW win AS ( ORDER BY IQ );
Result:
+-------------------+------+-----------+------------------------+------------------------+ | IdiotName | IQ | Lag Value | Current Minus Previous | Previous Minus Current | +-------------------+------+-----------+------------------------+------------------------+ | Dumbest | 30 | NULL | NULL | NULL | | Homer | 40 | 30 | 10 | -10 | | Patrick Star | 40 | 40 | 0 | 0 | | Ed | 40 | 40 | 0 | 0 | | Dumber | 50 | 40 | 10 | -10 | | Peter Griffin | 50 | 50 | 0 | 0 | | Cosmo | 55 | 50 | 5 | -5 | | Dumb | 60 | 55 | 5 | -5 | | Ralph Wiggum | 65 | 60 | 5 | -5 | | Richard Watterson | 70 | 65 | 5 | -5 | +-------------------+------+-----------+------------------------+------------------------+
Here, I used LAG( IQ, 1 )
for the lag value. The IQ
part means the IQ
column, and 1
means that we go back one row.
In the Current Minus Previous
column we subtracted the LAG()
value from the IQ
column (i.e. the value of the current row in that column). In the Previous Minus Current
column we did the opposite – we subtracted the IQ
column from the LAG()
output.
The lag value in the first row is NULL
because there’s no previous row for which to retrieve a value.
Given three rows use the same definition in their OVER
clause, I put that definition into a named window (which I define with the WINDOW
clause later in the query). This saves me from having to replicate the same definition across multiple columns.
Going Back More than One Row
We’re not limited to going back just one row. We could go back as many rows as we want. Let’s go back two rows:
SELECT
IdiotName,
IQ,
LAG( IQ, 2 ) OVER win AS "Lag Value",
IQ - LAG( IQ, 2 ) OVER win AS "Current Minus Two Back",
LAG( IQ, 2 ) OVER win - IQ AS "Two Back Minus Current"
FROM Idiots
WINDOW win AS ( ORDER BY IQ );
Result:
+-------------------+------+-----------+------------------------+------------------------+ | IdiotName | IQ | Lag Value | Current Minus Two Back | Two Back Minus Current | +-------------------+------+-----------+------------------------+------------------------+ | Dumbest | 30 | NULL | NULL | NULL | | Homer | 40 | NULL | NULL | NULL | | Patrick Star | 40 | 30 | 10 | -10 | | Ed | 40 | 40 | 0 | 0 | | Dumber | 50 | 40 | 10 | -10 | | Peter Griffin | 50 | 40 | 10 | -10 | | Cosmo | 55 | 50 | 5 | -5 | | Dumb | 60 | 50 | 10 | -10 | | Ralph Wiggum | 65 | 55 | 10 | -10 | | Richard Watterson | 70 | 60 | 10 | -10 | +-------------------+------+-----------+------------------------+------------------------+
We now get NULL
in the first two rows.
Setting a Default Value for NULLs
We can set a value to use in cases where NULL
is returned by LAG()
. We can do this by adding a third argument:
SELECT
IdiotName,
IQ,
LAG( IQ, 1, 0 ) OVER win AS "Lag Value",
IQ - LAG( IQ, 1, 0 ) OVER win AS "Current Minus Previous",
LAG( IQ, 1, 0 ) OVER win - IQ AS "Previous Back Minus Current"
FROM Idiots
WINDOW win AS ( ORDER BY IQ );
Result:
+-------------------+------+-----------+------------------------+-----------------------------+ | IdiotName | IQ | Lag Value | Current Minus Previous | Previous Back Minus Current | +-------------------+------+-----------+------------------------+-----------------------------+ | Dumbest | 30 | 0 | 30 | -30 | | Homer | 40 | 30 | 10 | -10 | | Patrick Star | 40 | 40 | 0 | 0 | | Ed | 40 | 40 | 0 | 0 | | Dumber | 50 | 40 | 10 | -10 | | Peter Griffin | 50 | 50 | 0 | 0 | | Cosmo | 55 | 50 | 5 | -5 | | Dumb | 60 | 55 | 5 | -5 | | Ralph Wiggum | 65 | 60 | 5 | -5 | | Richard Watterson | 70 | 65 | 5 | -5 | +-------------------+------+-----------+------------------------+-----------------------------+
Here we used 0
as the default value to use. Now we get 0
instead of NULL
. This changes the result of our calculations on the next two columns.
Partitions
Suppose we run a similar (non-partitioned) query against a different table:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 0 ) OVER win AS "Lag",
Score - LAG( Score, 1, 0 ) OVER win AS "Score Minus Lag",
LAG( Score, 1, 0 ) OVER win - Score AS "Lag Minus Score"
FROM Dogs
WINDOW win AS ( ORDER BY Score );
Result:
+---------+-------------+-------+------+-----------------+-----------------+ | DogName | Activity | Score | Lag | Score Minus Lag | Lag Minus Score | +---------+-------------+-------+------+-----------------+-----------------+ | Max | Keep Quiet | 1 | 0 | 1 | -1 | | Bruno | Keep Quiet | 2 | 1 | 1 | -1 | | Cooper | Keep Quiet | 8 | 2 | 6 | -6 | | Bruno | Fetch Stick | 43 | 8 | 35 | -35 | | Cooper | Wag Tail | 51 | 43 | 8 | -8 | | Bruno | Wag Tail | 65 | 51 | 14 | -14 | | Cooper | Fetch Stick | 67 | 65 | 2 | -2 | | Max | Wag Tail | 87 | 67 | 20 | -20 | | Max | Fetch Stick | 91 | 87 | 4 | -4 | +---------+-------------+-------+------+-----------------+-----------------+
We could partition the results by activity:
SELECT
DogName,
Activity,
Score,
LAG( Score, 1, 0 ) OVER win AS "Lag",
Score - LAG( Score, 1, 0 ) OVER win AS "Score Minus Lag",
LAG( Score, 1, 0 ) OVER win - Score AS "Lag Minus Score"
FROM Dogs
WINDOW win AS ( PARTITION BY Activity ORDER BY Score );
Result:
+---------+-------------+-------+------+-----------------+-----------------+ | DogName | Activity | Score | Lag | Score Minus Lag | Lag Minus Score | +---------+-------------+-------+------+-----------------+-----------------+ | Bruno | Fetch Stick | 43 | 0 | 43 | -43 | | Cooper | Fetch Stick | 67 | 43 | 24 | -24 | | Max | Fetch Stick | 91 | 67 | 24 | -24 | | Max | Keep Quiet | 1 | 0 | 1 | -1 | | Bruno | Keep Quiet | 2 | 1 | 1 | -1 | | Cooper | Keep Quiet | 8 | 2 | 6 | -6 | | Cooper | Wag Tail | 51 | 0 | 51 | -51 | | Bruno | Wag Tail | 65 | 51 | 14 | -14 | | Max | Wag Tail | 87 | 65 | 22 | -22 | +---------+-------------+-------+------+-----------------+-----------------+
To partition the results, we simply added a PARTITION BY
clause to our named window. Now the LAG()
value is based on the current partition. This means that there’s no lag value for the first row of each partition (because there are no previous rows in that partition). So in our case, instead of having one row with a NULL
value (that we replaced with a default value of 0
), we have three. We can see this by the fact that three rows have 0
in the Lag Value
column.