1. Aim for the ideal of only one single sql call per page request.
Firstly, I should explain what I mean by page request. When you open a page, the page may need to be populated with content from the database. That call to the database to populate that page is a page request. If that page is a data entry page and required data to be written back to the database, then clicking on the Update button will also perform a page request. So it’s impractical to say one single database request per page, because there are likely to be many, especially on a more sophisticated page.
Now, if you make multiple requests to the database, that tends to require each request to be authenticated, executed, then the results returned. I have seen applications where there are literally hundreds of database calls just to update a single grid. I have seen where every row of the grid forces a call to the database to retrieve the individual row, that row gets updates, then a call is made back to the database to commit that data. That’s a lot of network traffic to achieve what could ultimately be done in a single batch!
If you must touch the database, prefer to use a stored procedure call that takes all the parameters you need, including user information, and returns all the data required to populate the page. This has become less popular in recent times because applications are becoming more modular (web parts) and technologies such as LINQ to SQL don’t easily return multiple resultsets in a single call. Modern databases such as SQL Server 2008 do batch inserts, updates and deletes, but they can’t really do a good job of batching if you are selecting individual items, modifying the data then writing it back to the database row by row.
So, does every one of my applications have a maximum of one database request per page request? Of course not! It’s an ideal. I often don’t have the time to go making every page perform optimally from the start, so my pages usually average around 2 or 3 requests per page request. Practically speaking, I prefer to use LINQ to SQL to make stored procedure calls and only return single resultsets. PLINQ, which will give us the ability to make multiple database requests in parallel, will improve this somewhat. Unfortunately, it is unlikely to fix the scenario where each row of a grid is retrieved, updated and written back individually. It might speed this scenario up overall, however it is still far less efficient than a single call!
There are also scenarios where page content won’t change, or it changes infrequently. These days, you could potentially load an entire table into cache and perform LINQ queries to retrieve the data from there. It is far harder to debug applications where data is stored in cache, but it can be done. By doing this, you can get the total number of queries per page down significantly below one request per page on average. The work is then performed within your web server, which is a cheaper resource. By doing this, you an reduce the impact on the database, and you’ll potentially be able to scale your site up as sql server itself will only need to service the more important transactions. It is far easier and more cost efficient to scale up by adding extra web servers than it is to have to scale up your database and be able to handle a much higher volume of traffic.
2. Turn on SQL Profiler
This one’s easy, because most good developers already do it. If you turn on the profiler and execute your page, you should be able to see all the traffic caused by a single request. This is important, because you will be able to see a number of things. Firstly, you can look for candidates for queries that can be combined. You may also see situations that shouldn’t need to be done each time you request the page. You may also see situations where the exact same query is executed multiple times.
This case is a special one, because within an ASP.Net page, the event model can sometimes cause double execution of queries. This is usually done to recreate an old version of the page which can then be used to process post backs, and then a second query is executed as a result of the post back to change the state of the page. This can be pretty nasty depending on what you are doing. For example, if you were changing pages in the grid, you wouldn’t want it to query the old page just to process the request for the next page. Page.IsPostback is intended to prevent this kind of scenario, but it doesn’t always work out like this, especially if you have controls that were dynamically added to the page. In this scenario, there would be twice as much work done to retrieve a single page, as the database would potentially need to sort the data twice, run the rownumber routine against every candidate row, an so on. Of course, it has some smarts to reduce the amount of processing, but it still takes more time, and it still retrieves twice the data, which needs to be passed across the network, and rendered to the page possibly twice.
The profiler can also help you determine where to look in your code for problems. On numerous occasions I have identified code that I wasn’t expecting to be executed just by seeing an unexpected query via SQL Profiler.
And finally, the profiler can help you determine which query in a batch is taking the longest.
3. Preprocess expensive queries
One of my customers requires a hierarchy consisting of every company corporate structure tree in Australia. Some of those companies have holdings in around 40 to 50 thousand companies. If these were created on the fly, and a few of their customers requested these corporate structures at the same time, the impact on the server could be significant. What I did here is to stage the data in fixed temporary tables and then populate a table with the hierarchies already calculated and in the correct order. The time taken to retrieve the corporate structure, then, is the same as simply retrieving a user interface page of rows. This processing occured overnight, as it is acceptable for the data to be out of date by one day.
4. Review query plans to ensure optimally performing sql statements.
People have different ways of analysing and improving queries that they need to execute. One I like to analyse is the query plan. Now, the hardcore people look at the text output of the query plan, however I prefer the graphical view. From SQL Server Management Studio, select Query from the menu, then choose Show Actual Execution Plan. The next time you execute the query or stored procedure, it gives a graphical representation of the query execution in one of the tabs adjacent to the results and messages tab. The rule of thumb is to look at the relative expense of each subsection of the query and see if you can improve the performance of the more expensive parts. You work from top to bottom and right to left and aim to replace to icons (which represent underying query choices) with more efficent ones.
5. Identify table scans or index scans
Make sure your tables are using indexes. Table scans and index scans are candidates for adding or improving indexes you have. On very small tables, using a scan often doesn’t really matter. Also, when using an “exists” test, the query processing engine may decide that it’s more efficient just to perform a table scan of a table rather than use an index. This is perfectly acceptable. However, on larger tables with more rows, indexes are necessary and more efficient devices.
Each table has a maximum of one clustered index, and the rest are non-clustered. With the clustered index, the physical data within the table is actually organised in the order of the index. So it is the most efficient if it is used by the query optimiser. With the non-clustered indexes, a separate structure is kept within the database that contains the data ordered in the order of that index. So if you have a lot of these indexes, a significant amount of extra processing may be required to keep these indexes up to date. However, these indexes are usually smaller than processing the original table, and so can significantly improve the performance of a query. So if data entry speed and disk space is not a big issue, lots of indexes is fine and a good choice.
6. Identify key lookups
Within the query plan, identify Key Lookups (these used to be called Bookmarks). If there are any Key Lookups, they can be removed from the query plan by adding columns as include columns to the non-clustered index on the right of the Key Lookup. Include columns are fantastic, because they ultimately mean that when the candidate rows are found in the index, there will be no requirement to go back to the original table and retrieve the included columns because they are already in the index. So in this case, the speed of the query approaches that of a clustered index. Don’t worry, the data is kept the same between the include column and the original table, so you lose nothing here. The only downside is that included columns are added to the index and so index size increases, and data entry may slow down slightly because there are more references to maintain.
7. Check the order of columns in your where clause
Ensure the order of columns in your “where” clause is the same as in the order within your index, otherwise it may not choose your index. I’ve seen plenty of cases where scans are performed instead of seeks simply because the order of the columns in the where clause are not optimal.
8. Ensure the where clause is ordered most restrictive to least restrictive.
This will make sure that the most efficient path is taken when matching data between indexes in your query. By restrictive, I mean that the data is more uniquely selectable. So a column with different data in every row is more restrictive than a column with much of the same data in every row. Also consider the size of the table, so that a table with less data in it may be selected first in a join over a table with more data in it. This can be a bit of a balancing act.
9. Remove CTEs
CTEs are kind of like temporary tables. But they are mostly disastrously inefficient. I had one query taking 20 seconds to execute, which was unacceptable. It had a significant number of CTEs so I changed them all the Temp Tables and executed the query. The query sped up and took around 1 second to execute – which was fast enough for the scenario and so I could move on. There is one valid purpose I can think of for using CTEs and that’s for producing hierarchies. But in general, they can be replaced by temporary tables anyway. That said, however…
10. Remove Temp Tables
The creation of temp tables adds to the overhead required to run your overall query. In some scenarios, I have removed temporary tables and replaced them with fixed tables and had significant performance improvement.
If the temp table is created to enable to merging of data from similar data sources, then prefer a union instead. Unions, in general, are far far cheaper than temp tables.
#Temp tables are created in tempdb. @Temp tables are created in memory first, but the moment there is memory pressure, they spill over into tempdb as well. tempdb requires disk writes and reads, and so will be slower than accessing the original table data.
11. Remove Cursors
These are one of the most expensive statements you can use. There are special cases where they should be used, but it’s better to train yourself to use standard set based statements than cursors. If you want to understand when it’s reasonable to use a cursor, take a look at Greg Low’s article. But in general, avoid them like you would the plague. http://msmvps.com/blogs/greglow/archive/2007/02/04/improving-asp-net-session-state-database-performance-by-reducing-blocking.aspx
12. Reduce the number of joins in your queries
If you significantly reduce the number of joins in your query, you will have a vast improvement in speed of the query. There are a couple of ways to do this. You could stage the data in a denormalised table, or in Enterprise edition you can create a view and put an index on that view. Again, it depends on how immediate your requirement is for having the latest data. It is often acceptable for reporting to build these tables overnight due to the fact that a single day often has no impact on the benefits associated with a particular report.
13. Remove all declared variables
If everything is compilable within your stored proc, then there will be no need for the query engine to perform extra work to determine how the query will look after the declared variables are taken into consideration. So it will perform most optimally. So how do you remove declared variables? Well, for starters, you can pass them as parameters of your stored procedure call. If you find you do need to have declared variables, you can create a second stored procedure that does the main work and pass the declared variables into that. So the passed variables become parameters in the second query.
14. Give this fact sheet to your database developers
It’s a little bit out of date, but it’s the best one I have found so far. It shows the relative expensiveness of different data types, sargability of joins, includes various helpful information, and also describes many of the icons found in the query plans. Someone should tell that guy to update it.