For any number of reasons which I won’t go into, you may have a need, or desire, to return a value in a query from one row next to a value from another. A typical example might be in a daily sales database, where I want to see yesterday’s total sales against today’s total sales. It is straight forward enough to write the query to return the total sales for each day, the question however, is how do we display the sales for each day alongside the sales from the previous day?
The long way round:
One way we can do this is to carry out a self join on the query, offset based on the date, e.g.:
1 select ds1.SalesDate
2 , ds1.SalesValue
3 , ds2.SalesDate
4 , ds2.SalesValue
5 from Daily_sales ds1
6 , Daily_sales ds2
7 where ds2.SalesDate = (
8 select max(ds.SalesDate)
9 from Daily_sales ds
10 where ds.SalesDate < ds1.SalesDate
11 )
12 order by ds1.SalesDate
13 ;
As you can see, this is a very simple join and probably not very robust. If the database had data for a chain of stores for example, you may be interested to see the sales data each day against the previous day for each store, in which case you need to join on the field which identifies the store, in this example, StoreID:
1 select ds1.StoreID
2 , ds1.SalesDate
3 , ds1.SalesValue
4 , ds2.SalesDate
5 , ds2.SalesValue
6 from Daily_sales ds1
7 inner join Daily_sales ds2
8 on ds1.StoreID = ds2.StoreID
9 where ds2.SalesDate = (
10 select max(ds.SalesDate)
11 from Daily_sales ds
12 where ds.SalesDate < ds1.SalesDate
13 and ds.StoreID = ds1.StoreID
14 )
15 order by ds1.StoreID
16 , ds1.SalesDate
17 ;
Things get even more complex (and messy) if you have a database which instead of storing the total sales for each day, stores all sales for each day. In order to see total sales by day against total sales from the previous day for each store therefore, we need to write a query to sum the sales for each day and then perform a self join on this query using the same principles as above:
1 select q1.StoreID
2 , q1.SalesDay
3 , q1.DaySales
4 , q2.PreviousSalesDay
5 , q2.PreviousDaySales
6 from (
7 select ds.StoreID
8 , trunc(ds.SalesDate) SalesDay
9 , sum(ds.SalesValue) DaySales
10 from Daily_sales ds
11 group by ds.StoreID
12 , trunc(ds.SalesDate)
13 ) q1
14 inner join (
15 select ds.StoreID
16 , trunc(ds.SalesDate) PreviousSalesDay
17 , sum(ds.SalesValue) PreviousDaySales
18 from Daily_sales ds
19 group by ds.StoreID
20 , trunc(ds.SalesDate)
21 ) q2
22 on q1.StoreID = q2.StoreID
23 where q2.SalesDay = (
24 select max(sq1.SalesDay) MaxSalesDay
25 from (
26 select ds.StoreID
27 , trunc(ds.SalesDate) SalesDay
28 , sum(ds.SalesValue) DaySales
29 from Daily_sales ds
30 group by ds.StoreID
31 , trunc(ds.SalesDate)
32 ) sq1
33 where sq1.StoreID = q1.StoreID
34 and sq1.SalesDay < q1.SalesDay
35 )
36 order by q1.StoreID
37 , q1.SalesDay
38 ;
Using Analytic functions:
We can achieve everything we have done above much more simply and efficiently using the lag analytic function, which I will now demonstrate. First, the list of sales for each day along side the previous days sales:
1 select ds.SalesDate
2 , ds.SalesValue
3 , lag(ds.SalesDate,1) over (order by ds.SalesDate) PreviousDate
4 , lag(ds.SalesValue,1) over (order by ds.SalesDate) PreviousSales
5 from Daily_sales ds
6 order by ds.SalesDate
7 ;
To view the data by store we can use partition by:
1 select ds.StoreID
2 , ds.SalesDate
3 , ds.SalesValue
4 , lag(ds.SalesDate,1) over (partition by ds.StoreID order by ds.SalesDate) PreviousDate
5 , lag(ds.SalesValue,1) over (partition by ds.StoreID order by ds.SalesDate) PreviousSales
6 from Daily_sales ds
7 order by ds.StoreID
8 , ds.SalesDate
9 ;
Now for the final example we see how we can simply use the sum function for the scenario where our database stores individual sales for the day that have to be aggregated to give the total sales each day:
1 select ds.StoreID
2 , trunc(ds.SalesDate) SalesDay
3 , sum(ds.SalesValue) DaySales
4 , lag(trunc(ds.SalesDate),1) over (partition by ds.StoreID order by trunc(ds.SalesDate)) PreviousDay
5 , lag(sum(ds.SalesValue),1) over (partition by ds.StoreID order by trunc(ds.SalesDate)) PreviousSales
6 from Daily_sales ds
7 group by ds.StoreID
8 , trunc(ds.SalesDate)
9 order by ds.StoreID
10 , ds.SalesDate
11 ;
It is immediately obvious from these examples how much more succinct and easy to apply these functions are compared to the equivalent queries using joins. Furthermore, in performance terms they are much faster to execute. Finally, because they are less complicated to apply in certain situations such as this example, they can be much more accurate, predominantly because there is less scope to make mistakes when writing the query. Using the self joins it is critical that the where clauses and calculations for example are executed in the correct places, but that issue is avoided using the analytic functions.
Notes for T-SQL:
Whilst analytic functions are not specific to PL-SQL, Oracle appears to offer a wider range of them than in T-SQL and it seems there is no T-SQL equivalent to the Lag and Lead functions. It is still possible however to capture the previous/next row value more efficiently than with a self join by using the MIN/MAX functions and the Where clause as you would in the self join to specify the criteria. See here for an example (under Reporting Aggregate Functions).
James