SQL Server 2008 – Fourteen tips for optimising query performance for applications by Tony Wright

June 23, 2010

1. Aim for the ideal of only one single sql call per page request.

Firstly, I should explain what I mean by page request. When you open a page, the page may need to be populated with content from the database. That call to the database to populate that page is a page request. If that page is a data entry page and required data to be written back to the database, then clicking on the Update button will also perform a page request. So it’s impractical to say one single database request per page, because there are likely to be many, especially on a more sophisticated page.

Now, if you make multiple requests to the database, that tends to require each request to be authenticated, executed, then the results returned. I have seen applications where there are literally hundreds of database calls just to update a single grid. I have seen where every row of the grid forces a call to the database to retrieve the individual row, that row gets updates, then a call is made back to the database to commit that data. That’s a lot of network traffic to achieve what could ultimately be done in a single batch!

If you must touch the database, prefer to use a stored procedure call that takes all the parameters you need, including user information, and returns all the data required to populate the page. This has become less popular in recent times because applications are becoming more modular (web parts) and technologies such as LINQ to SQL don’t easily return multiple resultsets in a single call. Modern databases such as SQL Server 2008 do batch inserts, updates and deletes, but they can’t really do a good job of batching if you are selecting individual items, modifying the data then writing it back to the database row by row.

So, does every one of my applications have a maximum of one database request per page request? Of course not! It’s an ideal. I often don’t have the time to go making every page perform optimally from the start, so my pages usually average around 2 or 3 requests per page request. Practically speaking, I prefer to use LINQ to SQL to make stored procedure calls and only return single resultsets. PLINQ, which will give us the ability to make multiple database requests in parallel, will improve this somewhat. Unfortunately, it is unlikely to fix the scenario where each row of a grid is retrieved, updated and written back individually. It might speed this scenario up overall, however it is still far less efficient than a single call!

There are also scenarios where page content won’t change, or it changes infrequently. These days, you could potentially load an entire table into cache and perform LINQ queries to retrieve the data from there. It is far harder to debug applications where data is stored in cache, but it can be done. By doing this, you can get the total number of queries per page down significantly below one request per page on average. The work is then performed within your web server, which is a cheaper resource.  By doing this, you an reduce the impact on the database, and you’ll potentially be able to scale your site up as sql server itself will only need to service the more important transactions. It is far easier and more cost efficient to scale up by adding extra web servers than it is to have to scale up your database and be able to handle a much higher volume of traffic.

2. Turn on SQL Profiler

This one’s easy, because most good developers already do it. If you turn on the profiler and execute your page, you should be able to see all the traffic caused by a single request. This is important, because you will be able to see a number of things. Firstly, you can look for candidates for queries that can be combined. You may also see situations that shouldn’t need to be done each time you request the page. You may also see situations where the exact same query is executed multiple times.

This case is a special one, because within an ASP.Net page, the event model can sometimes cause double execution of queries. This is usually done to recreate an old version of the page which can then be used to process post backs, and then a second query is executed as a result of the post back to change the state of the page. This can be pretty nasty depending on what you are doing. For example, if you were changing pages in the grid, you wouldn’t want it to query the old page just to process the request for the next page. Page.IsPostback is intended to prevent this kind of scenario, but it doesn’t always work out like this, especially if you have controls that were dynamically added to the page. In this scenario, there would be twice as much work done to retrieve a single page, as the database would potentially need to sort the data twice, run the rownumber routine against every candidate row, an so on. Of course, it has some smarts to reduce the amount of processing, but it still takes more time, and it still retrieves twice the data, which needs to be passed across the network, and rendered to the page possibly twice.

The profiler can also help you determine where to look in your code for problems. On numerous occasions I have identified code that I wasn’t expecting to be executed just by seeing an unexpected query via SQL Profiler.

And finally, the profiler can help you determine which query in a batch is taking the longest.

3. Preprocess expensive queries

One of my customers requires a hierarchy consisting of every company corporate structure tree in Australia. Some of those companies have holdings in around 40 to 50 thousand companies. If these were created on the fly, and a few of their customers requested these corporate structures at the same time, the impact on the server could be significant. What I did here is to stage the data in fixed temporary tables and then populate a table with the hierarchies already calculated and in the correct order. The time taken to retrieve the corporate structure, then, is the same as simply retrieving a user interface page of rows. This processing occured overnight, as it is acceptable for the data to be out of date by one day.

4. Review query plans to ensure optimally performing sql statements.

People have different ways of analysing and improving queries that they need to execute. One I like to analyse is the query plan. Now, the hardcore people look at the text output of the query plan, however I prefer the graphical view. From SQL Server Management Studio, select Query from the menu, then choose Show Actual Execution Plan. The next time you execute the query or stored procedure, it gives a graphical representation of the query execution in one of the tabs adjacent to the results and messages tab. The rule of thumb is to look at the relative expense of each subsection of the query and see if you can improve the performance of the more expensive parts. You work from top to bottom and right to left and aim to replace to icons (which represent underying query choices) with more efficent ones.

5. Identify table scans or index scans

Make sure your tables are using indexes. Table scans and index scans are candidates for adding or improving indexes you have. On very small tables, using a scan often doesn’t really matter. Also, when using an “exists” test, the query processing engine may decide that it’s more efficient just to perform a table scan of a table rather than use an index. This is perfectly acceptable. However, on larger tables with more rows, indexes are necessary and more efficient devices.

Each table has a maximum of one clustered index, and the rest are non-clustered. With the clustered index, the physical data within the table is actually organised in the order of the index. So it is the most efficient if it is used by the query optimiser. With the non-clustered indexes, a separate structure is kept within the database that contains the data ordered in the order of that index. So if you have a lot of these indexes, a significant amount of extra processing may be required to keep these indexes up to date. However, these indexes are usually smaller than processing the original table, and so can significantly improve the performance of a query. So if data entry speed and disk space is not a big issue, lots of indexes is fine and a good choice.

6. Identify key lookups

Within the query plan, identify Key Lookups (these used to be called Bookmarks). If there are any Key Lookups, they can be removed from the query plan by adding columns as include columns to the non-clustered index on the right of the Key Lookup. Include columns are fantastic, because they ultimately mean that when the candidate rows are found in the index, there will be no requirement to go back to the original table and retrieve the included columns because they are already in the index. So in this case, the speed of the query approaches that of a clustered index. Don’t worry, the data is kept the same between the include column and the original table, so you lose nothing here. The only downside is that included columns are added to the index and so index size increases, and data entry may slow down slightly because there are more references to maintain.

7. Check the order of columns in your where clause

Ensure the order of columns in your “where” clause is the same as in the order within your index, otherwise it may not choose your index. I’ve seen plenty of cases where scans are performed instead of seeks simply because the order of the columns in the where clause are not optimal.

8. Ensure the where clause is ordered most restrictive to least restrictive.

This will make sure that the most efficient path is taken when matching data between indexes in your query. By restrictive, I mean that the data is more uniquely selectable. So a column with different data in every row is more restrictive than a column with much of the same data in every row. Also consider the size of the table, so that a table with less data in it may be selected first in a join over a table with more data in it. This can be a bit of a balancing act.

9. Remove CTEs

CTEs are kind of like temporary tables. But they are mostly disastrously inefficient. I had one query taking 20 seconds to execute, which was unacceptable. It had a significant number of CTEs so I changed them all the Temp Tables and executed the query. The query sped up and took around 1 second to execute – which was fast enough for the scenario and so I could move on. There is one valid purpose I can think of for using CTEs and that’s for producing hierarchies. But in general, they can be replaced by temporary tables anyway. That said, however…

10. Remove Temp Tables

The creation of temp tables adds to the overhead required to run your overall query. In some scenarios, I have removed temporary tables and replaced them with fixed tables and had significant performance improvement.

If the temp table is created to enable to merging of data from similar data sources, then prefer a union instead. Unions, in general, are far far cheaper than temp tables.

#Temp tables are created in tempdb. @Temp tables are created in memory first, but the moment there is memory pressure, they spill over into tempdb as well. tempdb requires disk writes and reads, and so will be slower than accessing the original table data.

11. Remove Cursors

These are one of the most expensive statements you can use. There are special cases where they should be used, but it’s better to train yourself to use standard set based statements than cursors. If you want to understand when it’s reasonable to use a cursor, take a look at Greg Low’s article. But in general, avoid them like you would the plague. http://msmvps.com/blogs/greglow/archive/2007/02/04/improving-asp-net-session-state-database-performance-by-reducing-blocking.aspx

12. Reduce the number of joins in your queries

If you significantly reduce the number of joins in your query, you will have a vast improvement in speed of the query. There are a couple of ways to do this. You could stage the data in a denormalised table, or in Enterprise edition you can create a view and put an index on that view. Again, it depends on how immediate your requirement is for having the latest data. It is often acceptable for reporting to build these tables overnight due to the fact that a single day often has no impact on the benefits associated with a particular report.

13. Remove all declared variables

If everything is compilable within your stored proc, then there will be no need for the query engine to perform extra work to determine how the query will look after the declared variables are taken into consideration. So it will perform most optimally. So how do you remove declared variables? Well, for starters, you can pass them as parameters of your stored procedure call. If you find you do need to have declared variables, you can create a second stored procedure that does the main work and pass the declared variables into that. So the passed variables become parameters in the second query.

14. Give this fact sheet to your database developers

https://onedrive.live.com/?authkey=%21AJn4pGln0kUFovg&cid=5E237543FFFB2891&id=5E237543FFFB2891%21409&parId=5E237543FFFB2891%21135&o=OneUp

It’s a little bit out of date, but it’s the best one I have found so far. It shows the relative expensiveness of different data types, sargability of joins, includes various helpful information, and also describes many of the icons found in the query plans. Someone should tell that guy to update it.


How to switch rows and columns in SQL Server 2008 by pivotting and unpivotting

June 19, 2010

I came across a scenario in which I needed to switch rows and columns in the results of a query. That’s basically because I need the data to be fed into a chart in one form, and grid in another form.

So here’s how the results looked before I ran my queries.

Before Pivot

And here’s the result after I switched the rows and columns using my queries.

After queries

Now, I haven’t really replaced the column headers with the DateBucket field, but that’s ok in my case, because I already know what the column headers should be based on the parameters I passed to my query.

Firstly, the test data I created is as follows:

CREATE TABLE #test
(
   LineNum bigint,
   DateBucket varchar(8),
   TotalOrders float,
   TotalItems float,
   TotalItem1 float,
   TotalItem2 float,
   TotalItem3 float,
   AverageCost float,
   AveragePrice float,
   AverageProfit float
);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (1, 'Jun 2010', 100.0, 220.0, 53.0, 72.0, 99.0, 1204.0, 2468.0, 1122.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (2, 'Jul 2010', 120.0, 300.0, 54.0, 73.0, 98.0, 1208.0, 2470.0, 1123.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (3, 'Aug 2010', 130.0, 280.0, 55.0, 74.0, 97.0, 1212.0, 2472.0, 1124.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (4, 'Sep 2010', 140.0, 190.0, 56.0, 75.0, 96.0, 1216.0, 2474.0, 1125.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (5, 'Oct 2010', 150.0, 250.0, 57.0, 76.0, 95.0, 1220.0, 2476.0, 1126.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (6, 'Nov 2010', 160.0, 260.0, 58.0, 77.0, 94.0, 1224.0, 2478.0, 1127.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (7, 'Dec 2010', 170.0, 200.0, 59.0, 78.0, 93.0, 1228.0, 2480.0, 1128.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (8, 'Jan 2011', 180.0, 210.0, 60.0, 79.0, 92.0, 1232.0, 2482.0, 1129.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (9, 'Feb 2011', 190.0, 230.0, 61.0, 80.0, 91.0, 1236.0, 2484.0, 1130.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (10, 'Mar 2011', 200.0, 270.0, 62.0, 81.0, 90.0, 1240.0, 2486.0, 1131.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (11, 'Apr 2011', 210.0, 240.0, 63.0, 82.0, 89.0, 1244.0, 2488.0, 1132.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (12, 'May 2011', 220.0, 290.0, 64.0, 83.0, 88.0, 1248.0, 2490.0, 1133.0);
[/code]

Now the first thing I need to do is to Unpivot the data. This will give me a list of DateBucket, ColumnHeaders and Values. Not exactly what I want, but it’ll do for now.  I will store this in a temp table so that I can retrieve it for re-pivotting later. 


CREATE TABLE #TestUnpivot
(
   DateBucket nvarchar(8),
   ItemDescription nvarchar(100),
   ItemValue float
);

INSERT INTO #TestUnpivot(DateBucket, ItemDescription, ItemValue)
SELECT DateBucket,  ItemDescription,  ItemValue
FROM
   (SELECT DateBucket, 
   TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit
   FROM #Test) p
   UNPIVOT
   (ItemValue FOR ItemDescription IN
      (TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
   ) AS unpvt

This outputs the following results (There’s more data, but I clipped it for brevity):

After the unpivot

Now I want the ItemDescription field to be data in the rows and I want the months to be columns. So I need to re-pivot the data. To re-pivot the data, I execute the following query:

SELECT ItemDescription as ItemValue,
[1] as Month1, [2] as Month2, [3] as Month3, [4] as Month4, [5] as Month5, [6] as Month6, [7] as Month7, [8] as Month8, [9] as Month9, [10] as Month10, [11] as Month11, [12] as Month12
FROM
   (SELECT v.ItemDescription, v.ItemValue, b.id
    FROM #TestUnpivot v
    join (
    select distinct top 100 PERCENT linenum as id, DateBucket
    from #test
    order by linenum) b on v.DateBucket=b.datebucket
   ) AS SourceTable
   PIVOT
   (
      MIN(ItemValue)
      FOR id IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
   ) AS PivotTable

Note that if you really really wanted to rename the columns, you could do that by using a dynamic query (sp_executesql) but avoid that if you can.Also note that I selected MIN for the aggregate function. This is intentional, because I feel that it requires the least amount of processing. Use of average would require the processing to keep a tally of all the results for later averaging, which I believe would add to the query time, maybe not much, but it would be more than MIN I think.

And finally, the grouping is done via the nested query:

SELECT DISTINCT TOP 100 PERCENT linenum as id, DateBucket
from #test
ORDER BY linenum

This groups the values by the month names found in the DateBucket column.So after executing this query, I get the result I wanted:

After queries

Merging all this into a single query, and I get the following:

SELECT ItemDescription as ItemValue, [1] as Month1, [2] as Month2, [3] as Month3, [4] as Month4, [5] as Month5, [6] as Month6, [7] as  Month7, [8] as Month8, [9] as Month9, [10] as Month10, [11] as Month11, [12] as Month12
FROM
   (SELECT v.ItemDescription, v.ItemValue, b.id
    FROM
    (
       SELECT DateBucket,  ItemDescription,  ItemValue 
       FROM (SELECT DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit 
             FROM #Test) p 
       UNPIVOT 
            (ItemValue FOR ItemDescription IN
                (TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit) 
            ) AS unpvt 
    ) as v
    JOIN (
       select distinct top 100 percent linenum as id, DateBucket
       from #Test
       order by linenum) b on v.DateBucket=b.datebucket
    ) AS SourceTable
    PIVOT
    (
       MIN(ItemValue)
       FOR id IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
    ) AS PivotTable

14 Tips for optimising SQL Server 2008 Queries in ASP.Net

June 19, 2010

This article is now here.


Solution: My generated LINQ to SQL stored procedure returns an int when it should return a table!

June 2, 2010

There are various scenarios where LINQ to SQL will not recognise the results of a stored procedure, and so not generate the correct return type. If the stored procedure is a composite stored procedure that involves a lot of temporary tables, select intos, multiple resultsets based on parameters input, or control of flow statements, then it could well be that the results returned to the code generator are not consistent with the live execution of the stored procedure. 

The usual scenario is that you call a CREATE TABLE #TempTable within your stored procedure, but when the stored proc is dragged from the Server tab to the LINQ to SQL designer, it returns an int data type, and not the generated table object that you were expecting.

There are a couple of ways around this. What you could do is create a dummy SELECT that returns a representation of the data. For example, if I wanted to return a report resultset, I might type the following:


ALTER PROCEDURE myschema.mystoredproc

@MyArg uniqueidentifier

as

begin
-- Start code added temporarily
SELECT Description=cast('Test data' as varchar(100)), Month1=cast(1 as decimal(4,1)), Month2 = cast(1 as decimal(4,1)), Month3 = cast(1 as decimal(4,1));
return;
-- End code added temporarily

CREATE #TempTable

(

Description varchar(100),

Month1 decimal(4,1),

Month2 decimal(4,1),

Month3 decimal(4,1)

)

...

more content goes here

...

end

The SELECT statement and the RETURN statement are inserted temporarily so that there are valid results returned from the stored proc that will be recognised by the LINQ to SQL generator. It works, but there is a danger in that you may get the returned types wrong.

Instead of doing this, I have found that the best thing to do is to put the following statement, SET FMTONLY OFF, into my stored procedure.


ALTER PROCEDURE myschema.mystoredproc

@MyArg uniqueidentifier

as

begin
-- Start code added temporarily
SET FMTONLY OFF;
-- End code added temporarily

CREATE #TempTable

(

Description varchar(100),

Month1 decimal(4,1),

Month2 decimal(4,1),

Month3 decimal(4,1)

)

...

more content goes here

...

end

What this now does is it returns the dataset metadata by executing the actual stored procedure. It will be recognised by LINQ to SQL and will now generate the correct return table object type.

When LINQ to SQL calls the stored procedure for generation, LINQ to SQL doesn’t want to actually execute statements or cause persistent changes to the database. So to ensure that no changes of are made to the database, it executes a

SET FMTONLY ON

So this is by design.

You can override this in your code, which is what I’ve done, to return the type you want.

But this does a full execution of your stored procedure. So if you are doing more than just performing a custom SELECT out of a temp table, you should probably remove or comment out this line after use. Otherwise, every time you try to recreate the stored proc in the LINQ to SQL designer, it will execute the stored procedure and potentially update data in your database!