blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

Jim 2.0

Analysing Query Performance in SQL Server 2005

Sometimes I will be writing a query and will be faced with the option of writing it in a number of different ways, each of which will give me the result I’m looking for, and the question I always ask is which is most efficient? Now ‘most efficient’ is an ambiguous term to an extent, because we could mean fastest run time, or fewest trips to the server, or a number of other measures. Since these tend all to be interrelated however, I think it will suffice to say simply ‘most efficient’. In this post I will demonstrate a recent example of different versions of the same essential query I wanted to run and how I analysed the performance of it to determine which was ‘best’.

Below are four queries in which I am trying to retrieve data from two tables; DailyStoreActivity and Deliveries. DailyStoreActivities only has data for four stores, whereas Deliveries (for some reason) has data for these four stores, and many more in addition. As you can see, simply joining the two tables on StoreName will return data from Deliveries for only the four stores in DailyStoreActivity, however I was curious if filtering Deliveries for the four stores explicitly would help the query run faster. I’m also joining on a date field and since some stores may not have received a delivery yet I have the option of using an inner join or a left-outer join too, so I am also investigating the difference between these two. Hence, I have four queries, which look like this:

    1 select  dsa.datevalue

    2 ,       dsa.StoreName

    3 ,       max(ds.DeliveryDate) MaxDeliveryDate

    4 from    DailyStoreActivity dsa

    5 left    outer join Deliveries ds

    6 on      dsa.StoreName = ds.StoreName

    7 and     dsa.DateValue >= cast(CONVERT(char(10), ds.DeliveryDate, 110) as datetime)

    8 group   by dsa.datevalue

    9 ,       dsa.StoreName

 

    1 select  dsa.datevalue

    2 ,       dsa.StoreName

    3 ,       max(ds.DeliveryDate) MaxDeliveryDate

    4 from    DailyStoreActivity dsa

    5 inner   join Deliveries ds

    6 on      dsa.StoreName = ds.StoreName

    7 and     dsa.DateValue >= cast(CONVERT(char(10), ds.DeliveryDate, 110) as datetime)

    8 group   by dsa.datevalue

    9 ,       dsa.StoreName

 

    1 select  dsa.datevalue

    2 ,       dsa.StoreName

    3 ,       max(ds.DeliveryDate) MaxDeliveryDate

    4 from    DailyStoreActivity dsa

    5 left    outer join Deliveries ds

    6 on      dsa.StoreName = ds.StoreName

    7 and     dsa.DateValue >= cast(CONVERT(char(10), ds.DeliveryDate, 110) as datetime)

    8 where   ds.StoreName in ('Store1', 'Store2', 'Store3', 'Store4')

    9 group   by dsa.datevalue

   10 ,       dsa.StoreName

 

    1 select  dsa.datevalue

    2 ,       dsa.StoreName

    3 ,       max(ds.DeliveryDate) MaxDeliveryDate

    4 from    DailyStoreActivity dsa

    5 inner   join Deliveries ds

    6 on      dsa.StoreName = ds.StoreName

    7 and     dsa.DateValue >= cast(CONVERT(char(10), ds.DeliveryDate, 110) as datetime)

    8 where   ds.StoreName in ('Store1', 'Store2', 'Store3', 'Store4')

    9 group   by dsa.datevalue

   10 ,       dsa.StoreName

That’s all well and good, but how do we analyse these? The most obvious way you can do this, and also the simplest and crudest, is to simply see how long each one takes to run, but if you want to do something a little more sophisticated, then SQL Server has a few tricks up its sleeve. We could of course take a look at the execution plan (either estimated or actual), but that’s not what I wanted to write about, so instead we shall look at a couple of ways to view some statistics about the performance of the queries.

The first thing we can do is to return some additional information about the results of the query in the messages tab which comes up next to the results tab in the results pane, which we do with the command set statistics io on, to return the following messages.

(2532 row(s) affected)
Table 'DailyStoreActivity'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Deliveries'. Scan count 1, logical reads 1175, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(2532 row(s) affected)
Table 'DailyStoreActivity'. Scan count 19, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Deliveries'. Scan count 4, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


This only gives us a little more information related to how often the query has to read through the data and in this example we are interested in comparing logical reads. Basically, small numbers are better than large numbers, so you can see that the second run was significantly more efficient than the first. For a more detailed view now, another tool we can use is to turn on client statistics to be returned when we run the queries, which will tell us how much work the query has to do to return the results. The icon for this is on the SQL editor toolbar.

When the query is run an additional tab will be available in the results pane to view these statistics, and as each query is run (or the same one run several times) the next set of statistics will be shown, with up to 10 ‘trials’ visible at a time. SQL Server also provides coloured arrows to indicate better, worse or the same performance between trials.

To asses the performance between my queries above, I ran each one five times and collected an average of the statistics for each one, which I copied into excel to compare. From these results you can see that the use of a left join and a where clause in this case provides the most efficient query, most noticeably in terms of client processing time and total execution time.

Inner Where

Left Where

Inner

Left

Client Execution Time

Query Profile Statistics

  Number of INSERT, DELETE and UPDATE statements

0

0

0

0

  Rows affected by INSERT, DELETE, or UPDATE statements

0

0

0

0

  Number of SELECT statements

2

2

2

2

  Rows returned by SELECT statements

2528

2528

2528

3164

  Number of transactions

0

0

0

0

Network Statistics

  Number of server roundtrips

1

1

1

1

  TDS packets sent from client

1

1

1

1

  TDS packets received from server

19

19

19

23

  Bytes sent from client

744

754.8

592

602

  Bytes received from server

77191

77191

77195

91408

Time Statistics

  Client processing time

50

27.8

56.2

62.2

  Total execution time

77.8

55.6

124.6

140.2

  Wait time on server replies

27.8

27.8

68.4

78

There are two cautionary notes I should make regarding this kind of analysis however, the first of which is that SQL Server caches data returned from queries executed for us if the same or a similar query is run again, meaning that subsequent queries will run faster, potentially given the appearance that they are better. Fortunately Jamie Thomson provides us with a way to clear these caches using the following commands, although you need the appropriate permissions to be able to perform these actions

DBCC DROPCLEANBUFFERS -- Clears the data cache

DBCC FREEPROCCACHE -- Clears the procedure cache

The second point, which is a more general note, is that it is all very well being able to perform this kind of analysis on the data; however it is possible to become to interested in tweaking and refining queries for small improvements. Many small improvements can result in big gains, especially when working with very large data sets, but when the stakes are low and the gains negligible it becomes more of an intellectual indulgence, which offer serves to benefit few people, and indeed may be seen as a waste of time and resources. It’s essential to bare in mind that what is most important is to get the result required at the lowest overall cost to deliver that result.

 James.

Published 03 February 2007 01:29 by James.Pipe

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jim 2.0 said:

In response to my recent post about analyzing query performance in SQL server 2005, my colleague Mr Rowland

February 9, 2007 19:20

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems