Microsoft releasing SQL Server 2012 gave database developers a number of new functions. This article describes two of them - LAG and LEAD window functions. The first one returns value from one of previous rows, the second one a value from one of following rows.
Before LAG and LEAD functions, comparing values from different rows were not so easy. It required using self joins and subqueries or common table expressions to be easier to understand. Now, it can be done in an elegant way - with LAG or LEAD function.
Test data
All queries in this article are based on test data created and described in the previous one - Window aggregate functions in SQL 2012. If you are looking for a script to create this table structure and insert test data, please check there. For simplicity, below I present only brief information about it.
Test data contains result of sprint races for sprinters.
There are two table as shown on the diagram:
There are 3 sprinters in runners table:
runnerId | name |
1 | Tom |
2 | Andrew |
3 | John |
First two participated in 10 races. The results are stored in raceResults table:
raceResultId | raceId | runnerId | raceTime |
1 | 1 | 1 | 10.50 |
2 | 2 | 1 | 10.43 |
3 | 3 | 1 | 10.38 |
4 | 4 | 1 | 10.36 |
5 | 5 | 1 | 10.21 |
6 | 6 | 1 | 10.20 |
7 | 7 | 1 | 10.19 |
8 | 8 | 1 | 10.14 |
9 | 9 | 1 | 10.08 |
10 | 10 | 1 | 10.06 |
11 | 1 | 2 | 10.40 |
12 | 2 | 2 | 10.43 |
13 | 3 | 2 | 10.20 |
14 | 4 | 2 | 10.10 |
15 | 5 | 2 | 9.95 |
16 | 6 | 2 | 10.10 |
17 | 7 | 2 | 10.03 |
18 | 8 | 2 | 11.14 |
19 | 9 | 2 | 11.03 |
20 | 10 | 2 | 11.30 |
LAG function
Have you ever dreamed about having a possibility to combine in one row a value of the current row and a value from the previous one? If so, with SQL Server 2012 that dream becomes true!
LAG provides a way to use a value from any of previous rows based on a specified order.
In short version LAG requires:
- a scalar expression
- OVER keyword
- ORDER BY clause
This short version calculates a value of the scalar expression based on the previous row in the window. The window has to be defined so the OVER keyword is required. The LAG function is not a set operator but a list operator because an order of rows matters in this case - it has to be the previous row so this window has to be an ordered list.
There is no reason to wait any longer. Let's take a look how it works in practice.
Previous value
I want to display the current result (race X) and the previous one (race X-1) for each runner. As each runner has to be considered separately, I need to use an optional statement - PARTITION BY.
select runnerId, raceId, raceTime,
lag(raceTime) over (partition by runnerId order by raceId) as prevRaceTime
from raceResults
order by runnerId, raceId
SQL Server virtually creates 2 separate lists (partitions) - one for each runner (partition by runnerId). Each list is sorted by raceId (order by raceId). LAG(raceTime) returns the previous value to the current row value on the list. It is pretty obvious but it is also worth to notice - first rows on each partition are NULLs because the previous value does not exist for the first row.
Two rows back value
If there is a need to reference a value from two rows back, it is also easily possible with LAG function. Although, LAG requires only one argument, there is also a second one that is optional. It is an offset parameter that instructs SQL Server how many rows it should look back for the queried value. A default is 1. It means LAG(value) = LAG(value, 1).
select runnerId, raceId, raceTime,
lag(raceTime, 2) over (partition by runnerId order by raceId) as timeInTheSecondRaceBack
from raceResults
order by runnerId, raceId
Referencing two rows back, results in two NULL values for each partition because 2 rows back do not exist for a first and a second row.
Default value
Those NULL values returned for first rows are not always wanted. If they are not, they can be defaulted to any other value. The below query uses 99.99 as a default. A default value can be provided as a third parameter to the LAG function.
select runnerId, raceId, raceTime,
lag(raceTime, 2, 99.99) over (partition by runnerId order by raceId) as timeInTheSecondRaceBack
from raceResults
order by runnerId, raceId
LEAD function
The LEAD function is very similar to the LAG function. A difference is LAG looks back for values and LEAD looks forward. Similarly to LAG, LEAD requires:
- a scalar expression
- OVER keyword
- ORDER BY clause
It may take 2 other optional parameters: offset and default.
A sample query that returns for each race a current race result and a result two races back is presented below.
select runnerId, raceId, raceTime,
lead(raceTime, 2, 99.99) over (partition by runnerId order by raceId) as timeInTheSecondRaceForward
from raceResults
order by runnerId, raceId
Because of PARTITION BY clause existence, each runner is considered separately in context of the LEAD function. LEAD value on third row comes from fifth row. Default value was also used to replace last NULLs (marked in yellow).
Generally saying, the LEAD function works in analogical way as the LAG function.
Using LAG/LEAD in expressions
LAG and LEAD functions are useful also because they can be put into expressions to calculate difference between values from two consecutive rows. As LAG/LEAD returns a scalar value, depending on data type, it can be concatenated, multiplied, divided, added, deducted, compared etc. Take a look at the example below. It uses a previous value to check whether a runner was faster or slower this time.
select runnerId, raceId, raceTime,
lead(raceTime) over (partition by runnerId order by raceId) as prevRaceTime,
case when raceTime < lag(raceTime) over (partition by runnerId order by raceId) then 'FASTER :)' else 'SLOWER :(' end as comparisonToPrevRace
from raceResults
order by runnerId, raceId
A value from a previous row is compared to a current row (raceTime < lag(raceTime)) and a proper text is return based on the result.
Moreover, one value can be a result of an expression containing both LAG and LEAD function.
Summary
LAG and LEAD are window functions which means they should be used with OVER keyword. Additionally, it requires ORDER BY clause in the OVER section to specify order of rows. Without the order, previous row or following row phrase would not make much sense. Theory about LAG function can be found on MSDN pages - LAG (Transact-SQL). LEAD is described on LEAD (Transact-SQL).
Before LAG and LEAD functions, comparing values from different rows were not so easy. It required using self joins and subqueries or common table expressions to be easier to understand. Now, it can be done in an elegant way - with LAG or LEAD function.