Sql Server 2005 – Twelve Tips For Optimizing Query Performance by Tony Wright

May 26, 2008

1. Turn on the execution plan, and statistics

The first thing you need to do is to use the tools that help you determine whether a query done one way is better than another. That’s what we’re trying to do. By comparing the original query to a new query that we come up with is the best way to evaluate the benefits of any changes.

To do this, go into Sql Server Management Studio and select the Query menu. Select the “Include Actual Query Plan.” This turns on the graphical Execution Plan when you execute a query, and that can be found in the bottom pane after the execution.

In the Execution Plan, you can mouse over the components of the plan and it provides tooltip information boxes. The box contains Estimated Subtree Cost, which can be used to help determine whether one query is better than another. Of course, it’s not always right, as some query parts are not included in the execution plan, but it helps. It is also helpful to know the estimated number of rows, which is also found in this tooltip box.

Next, turn on statistics. Type the following statement:

SET STATISTICS IO ON;

This causes statistics to be output to the Messages tab in the bottom pane. The information you want here is mainly logical reads and physical reads. Logical reads are page reads from memory. Physical reads are page reads from disk. This stat can be a little deceptive as it doesn’t include CPU in the metric, but in general, the less page reads, the less work done, and so the more performant the query will be.

To counteract the above two you should also compare the actual execution times. To do this, eecute the following statement:

SET STATISTICS TIME ON;

This also has issues, as blocking and contention issues affect the output time. You should execute the query a few times to determine how accurate the time shown is.

2. Use Clustered Indexes

Having the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performant type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.

The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:

ALTER TABLE MySchema.SalesOrderHeader
DROP CONSTRAINT PK_SalesOrderHeader
GO
ALTER TABLE MySchema.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader
PRIMARY KEY NONCLUSTERED(SalesOrderID);
GO

Then the following statement adds a new clustered index to a table.

CREATE CLUSTERED INDEX MyClusteredIndex
ON MySchema.SalesOrderHeader (OrderID)
GO

3. Use Indexed Views

Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!

CREATE VIEW MySchema.SalesByCustomer
WITH SCHEMABINDING
AS
SELECT soh.SalesTerritoryID, soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID
GO

Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:

CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView
ON MySchema.SalesByCustomer(
SalesTerritoryID, CustomerID
)
GO

4. Use Covering Indexes

Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimised for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:

CREATE NONCLUSTERED INDEX TestIndex
ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)

The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, then the query optimiser doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimiser doesn’t need to query the original table, performance is improved.

5. Keep your clustered index small.

One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.

6. Avoid cursors

A bit of a no-brainer. Cursors are less performant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimiser can’t optimise a CURSOR statement, instead optimising the queries within each execution of the cursor loop, which is undesireable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided. There is a case where it is useful to use cursors, however, and that is when you actually want a row by row execution. In the case where there’s blocking occurring in a large batch of rows that’s exactly what you want.

7. Archive old data

Another no-brainer, so I won’t say much. If you want to improve query performance, give the optimiser less work to do. If you can cut down the number of rows the query has deal with, then performance will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.

8. Partition your data correctly

These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimiser can use the partition function to look at rows only on the most appropriate filegroup. To create partitions, you need a partition function and a partition scheme.

CREATE PARTITION FUNCTION myRangePartitionFunction(int)
AS RANGE RIGHT FOR VALUES (1,100,1000)

Once the partition function is created, you can then apply the function to a partition scheme for a table.

CREATE PARTITION SCHEME myRangePartitionScheme
AS PARTITION myRangePartitionFunction
TO (filegrp1, filegrp2, filegrp3, filegrp4)

Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:

CREATE TABLE mySchema.myPartitionTable(
col1 int,
col2 nvarchar(100)
)
ON myRangePartitionScheme(col1)

9. Remove user-defined inline scalar functions

Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimised into the query plan by the query optimiser. Bad news, because it turns a Seek into a Scan. Queries like this may appear to be performant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really really badly. No seriously, really bad.

Here’s an example of what I’m talking about:

CREATE FUNCTION dbo.fnGetPostalCode(
   @Suburb nvarchar(100),
   @State nvarchar(10)
)
RETURNS int
AS
BEGIN
   RETURN ISNULL(
   (
      SELECT PostalCode
      FROM dbo.PostalCode
      WHERE Suburb = @Suburb
      AND State = @State
   ), -1 );
END
GO

The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.

SELECT s.SalesPersonID,
       s.SuburbName,
       s.State,
      dbo.fnGetPostalCode(s.SuburbName,s.State)
AS PostalCode
FROM dbo.SalesPerson

You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”

One way to get around this is to simply inline the underlying query from the function, as follows:

SELECT s.SalesPersonID, s.SuburbName, s.State,
ISNULL( (SELECT PostalCode
         FROM dbo.PostalCode
         WHERE Suburb = s.SuburbName
         AND State = s.State), -1)
      AS PostalCode
FROM dbo.SalesPerson

Inline the SQL statement will perform significantly better.

10. Use APPLY

The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:

SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
          FROM Sales.SalesOrderDetails
          WHERE SalesOrderID = soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
           FROM Sales.SalesOrderDetails
           WHERE SalesOrderID = soh.SalesOrderID)
           FROM Sales.SalesOrderHeader soh)

This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:

SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
CROSS APPLY (
   SELECT TOP (1) sod.UnitPrice, sod.Quantity
   FROM Sales.SalesOrderDetail sod
   WHERE sod.SalesOrderId = soh.SalesOrderId
   ORDER BY sod.Quantity DESC
) as a

11. Use computed columns

Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate SalesPrice and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called SalesPrice, then index it, and the query optimiser will no longer need to retrieve the UnitPrice and Quantity data and do a calculation – it’s already done.

12. Use the correct transaction isolation level

If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.

For example, READ UNCOMMITTED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occurring that require atomicity. This is the most performant and it ignores locking altogether, but is generally not allowed by good design and is a special case.

With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transactionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.

There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.


How and Why Test Driven Development (TDD) and NUnit?

May 19, 2008

In my very first post, titled Some Fix Time Data, I showed statistics that demonstrate that the earlier you find bugs, the cheaper the cost. It’s not the only reason for believing in TDD, but it’s pretty compelling.

TDD helps in a number of ways. It helps you find bugs earlier. It helps when making regression changes. It helps when you are trying to determine what the functionality is that you are trying to implement. It improves quality, and provides that quality earlier. It helps you concentrate only on the needs of the application, and thus spend less time doing undesireable work.

By forcing us to write tests before implementing the meat of the application, we are being clear about the requirements. If the requirements are not clear, the tests cannot be written, and the tests themselves are a representation of those requirements. So if given a task, and the task is ambiguous, you will be able to demonstrate any problems much earlier.

Perhaps the simplest way to start with TDD, is to download NUnit from http://www.nunit.org. NUnit is free, and has a graphical user interface that enables you to build tests to test your objects. Note here that TDD isn’t actually about testing the User Interface itself.  There are other strategies for that. TDD is about building your business logic layer, and having a set of tests that allow for regression.

The first thing to do after installing NUnit is to open a project. The project can be either a new project or an existing project. You can add tests to an existing project, or add a new project specifically for testing. I prefer the second.

Next, you need to add a testing class. This is called a test fixture. The method calls within a testing class are, funnily enough, the tests. NUnit recognises test fixtures and tests by attributes added to the testing class, as follows.

[TestFixture]
public class MyTestClass
{
   [Test]
   public void MyTest()
   {
      //perform test code here
   }
}

NUnit recognises test fixtures by the [TestFixture] attribute added to the testing class. The tests themselves are also identified by the [Test] attribute. Note also that all tests must be public void and have no parameters. Also, you need to add a reference to NUnit.Framework in your project, and add the Imports NUnit.Framework statement in your class so that it can reference NUnit methods.

You write your test and at the end, use Assert statements to check the validity of the test. Here’s an example that adds two numbers:

using System;
using NUnit.Framework;
namespace MyNamespace
{
   [TestFixture]
   public class MyTestFixtureClass
   {
      [Test]
      public void CheckIfAdditionWorks()
      {
         int firstNumber = 1;
         int secondNumber = 2;
         int result = firstNumber + secondNumber;
         Assert.AreEqual(result,3);
      }
   }
}

When the test runs, this test will succeed, and the NUnit GUI will show a green bar. Green indicates success, Red failure, and Yellow for ignored tests. So go ahead and run NUnit. In the NUnit GUI, select File from the menu, and Open Project. Add the assembly or assemblies that contain your test fixture classes. Make sure you try this out first, as the following step won’t work unless you do it.

Next, you need to go into Project Properties (I’m using VS2008 here). Select your project in the Solution Explorer, right click, then select Properties. Choose the Debug item in the left navigation bar. In the Start Action group, select start external program and set it to: “C:\Program Files\NUnit 2.4.7\bin\nunit.exe”. This will now execute the NUnit program when you run the application from within Visual Studio. In the Start Options group, add the following settings “/run /fixture=MyNamespace.MyTestFixtureClass”.

The additional command line arguments basically force NUnit to immediately run the required tests. The /run argument without the /fixture argument will simply run all tests. With the /fixture argument specified with the name of a test fixture class, it will execute only the tests within that class. This should get you going.

And if you need to get more sophisticated, you can even choose to include or exclude categories of tests. NUnit can also be set up to use config files.

 


Choose the right primary key in SQL Server 2005, and partitioning your table

May 17, 2008
I have a client who has added an EffectiveFrom and EffectiveTo datetime column to almost every table in the database. It effectively allows them to keep versions of every record. This is important, especially when there are likely to be a lot of record changes and therefore record versions in the database.

To ensure that only the latest version of a record is used, they’ve then had to add one of these two columns to the primary key of the table. Ok, so this appears to be a no brainer. Because the EffectiveTo date won’t be set for the latest record version, and nullable columns can’t be added to the primary key, the obvious choice is to make the EffectiveFrom date part of the primary key.

But this creates a problem.

You see, if you’re nearly always using the latest version of the record, then your query will be something like:

SELECT * FROM MyTable WHERE KeyField1=Val1 AND KeyField2=Val2 AND …blah… AND EffectiveTo IS NULL

And this is not optimal, because while it will retrieve the record from the table, it needs to do a retrieval of all versions of the record before it can filter down to just the record that you want.

Ok, so lets see what we can do about it. Say we decide that in future, we won’t allow the EffectiveTo date to be nullable. Instead of null, we will set the EffectiveTo date to ‘2999-12-31’.

Now the column can be used in the primary key, and the record will be selected using a clustered index seek, provided, of course, all the previous primary key columns are specified in the query to. So the above query becomes:

SELECT * FROM MyTable WHERE KeyField1=Val1 AND KeyField2=Val2 AND …blah… AND EffectiveTo = ‘2999-12-31’

The problem here is that we are using magic numbers and unless there are exceptional circumstances, magic numbers are bad.

But how would we do it otherwise? Easy, add an IsActive bit field to the key before the EffectiveFrom field. That would would work.The EffectiveFrom field is still needed to ensure uniqueness in versioning (a bit field only has 0 or 1, remember?)

So an IsActive field is a much the better solution.

But wait, there’s more!

Because the IsActive column is now part of the primary key, you can use it to partition the table. You create a partitioning function to split the data, as follows:

create partition function myPartition(bit)

as range right for values(1)

Then you create a partioning scheme so that the data can be divided between filegroups. Here, I put current data on the Primary filegroup, and I’ve created another called OlderData for the archived but referencable data.

create partition scheme myScheme

as Partition myPartition

TO ([Primary],OlderData);

And then recreate your table to partition across the filegroups using this scheme.

CREATE TABLE MyTable

(KeyField1 varchar(10) not null,

KeyField2 varchar(10) not null,

EffectiveFrom datetime not null,

EffectiveTo datetime not null,

IsActive bit not null,

OtherColumns…

PRIMARY KEY (KeyField1, KeyField2, EffectiveFrom, IsActive ) )

ON myScheme (IsActive) ;

Consider the impact if you have EffectiveFrom and EffectiveTo columns on almost every table in your database. Now your queries should be much more efficient, as the current data is all on the primary partition/filegroup and the archive data is all on the OlderData partition/filegroup.

When you’re mainly using the latest data, your queries will spend the majority of their time in the Primary partition, which is a much smaller subset of the overall data

Test it out yourself – but don’t forget to click on “Include Actual Execution Plan” in the Query menu of Sql Server Management Studio so that you can see if it makes a difference to the queries you create.

 


Business Intelligence Bootcamp Part 3

May 17, 2008

We finished up the BI Bootcamp yesterday, and after five days I think I’ve done the Matrix Brainload thing, and absorbed a whole stack of information. This was definitely worth it.

I now understand the basics for constructing OLAP cubes, what kinds of queries I can make to obtain existing statistics, and also to support budgetting and forecasting. I know how to work with dimensions and facts, constructed some MDX queries, created some KPIs, and also covered performance and partitions.

I also learnt how to access the OLAP cubes from SSRS and Excel. With Excel, you link OLAP using Pivot tables, which every good financial controller knows how to do. We also used Microsoft’s ProClarity product to do advanced data analysis. The area that I was most interested in, though, was BI’s predictive ability. We were introduced to the various predictive algorithms, and did some queries that involved probabilities and confidence using the OLAP cube data.

Then we built Sharepoint Dashboards to enable us to represent the information that we built in the labs. This was great, because not only did we represent the data from the cube as figures, but also in graphical form and in a way that enabled us to visually identify the implications of what that data was telling us.

By implications, I mean that you can compare something like Sales Goals to Actuals, you could draw a bar graph in green if the goal was met, or in red if it didn’t. If it didn’t meet the goals, you were able to drill down the hierarchy to identify where something went wrong, or if someone was exceptional, drill down and see if you can identify the secret of their success! Of course, most of the graphing ability has been around for a while, but by putting it in Sharepoint, you are making it potentially accessible to a much wider audience.

We did Performance Point Scorecards, setting a whole lot of options, and were able to analyse the performance using Performance Point Server. We were able to select Analytic templates to do charting, grids, etc. We also built a Performance Point dashboard.

Finally, we did a module on planning, budgetting and forecasting. This was probably the biggest module and included the most significant lab exercise of the entire course! The outcome was that we were able to produce Sales Quota Forecasts.

The course has ended and I’m back in Melbourne. Would I do it again? I think I probably would. Peter Myers is very articulate, and explained key points well. To be able to pick up so much in such a short period of time is just so beneficial. I benefit because I now have a new set of skills for my repertoire, and businesses benefit because, with a small set of skilled people in this area, they are more likely to be able to find someone to deliver their intelligence solution! I highly recommend this course.


Business Intelligence Bootcamp Part 2

May 14, 2008

Well, it’s certainly been an enjoyable ride. I never realised that business intelligence actually covers Sql Server Integration Services (SSIS), Sql Server Reporting Services (SSRS) and Sql Server Analysis Services (SSAS). The BI practitioner needs all three to be effective. I had a lot of previous experience with DTS packages, but it has evolved considerably and is miles ahead of its predecessor. I had also done SSRS but even with significant knowledge, there’s still stuff to learn. And Analysis Services, well, the last time I touched an OLAP cube was around 1999, so it’s a life time in computer terms!

Yesterday we did a whole lot of SSRS to support Business Intelligence. I had done a lot of SSRS before, however I would never make the claim that I know everything, and so I made sure I covered all the steps in the labs, and I was glad to do so, as I did learn different ways of doing things. One of the major things I learnt here is about Report Builder, which is self-service reporting. It’s sort of like a reduced feature report designer that is targetted at Business types that don’t want to go through developers just to get reports done. That’s pretty neat, because what developer really wants to spend their life doing reports! The course also covered integrating the reports with .net applications and how to do it programmatically.

And finally today we got onto UDM (Unified Data Model), OLAP, building cubes, working with dimensions and facts, calculations, KPIs, processing and deployment. I think so far, the most enjoyable thing for me was the SSIS, because I learnt so much, and the topic that required the most concentration is most definitely the construction of the data cubes.

Half way through the course now, and it’s pretty full-on. I have to say if you’ve been around IT for a few years and know your stuff (say, intermediate level and above), and you’re looking for a challenge in an area you don’t know that much about, then BI is definitely a choice worth looking at, and an intensive course will help you hit the ground running as quickly as possible.


Business Intelligence Bootcamp Part 1

May 13, 2008

I decided to do a course. Basically, I decided that rather than fumble through books and notes and help files, I’d actually go and learn a whole stack of new skills and upgrade some old ones. Courses are great, because you can learn an awful lot in a very short period of time, and if you choose the right one, you can learn best practice and learn it from the best.

As my strongest skill is actually in SQL Server, I decided to seek out something that would extend that. I decided that because database, and data concepts are used extensively in BI, it would be the logical choice.

For someone with as many years experience as I have, it is often difficult to find courses of a suitably advanced level, that will also help you hit the ground running in whatever the chosen skill is.

Well, I found the course of choice at Greg Low’s new company SolidQ. The course is Peter Myers BI Bootcamp. See http://learning.solidq.com/oz/CourseDetail.aspx?CourseScheduleId=7. Peter Myers is one of the top BI experts, if not the top BI expert in Australia. He wrote the course!

Anyway, I’ve now completed Day 1, now into Day 2. It’s actually been quite mind broadening. Basically, I had no idea exactly how extensive the need for Sql Server Integration Services (SSIS) is in Business Intelligence. To become an expert in BI, you will need to spend a lot of time taking the data from your data source, massaging it, and placing it in Fact and Dimension tables in your Data Warehouse database. And SSIS is just so rich! There are so many controls in the toolbox that can be added to the design area that it’s simply brilliant.


LINQ to Entities

May 10, 2008

I’ve been investigating LINQ to Entities lately, and I have to tell you, it looks great! I saw a presentation by Adam Cogan and Justin King at Code Camp Oz 08 (http://www.codecampoz.com), which basically compared LINQ to Entities to NHibernate, and after the presentation,  I reckon NHibernate may well be on the way out. Using LINQ to Entities is just so simple. The beauty of both of these systems is that any errors are found at compile time. My article below, titled “Some fix time data” shows that finding code defects is significantly cheaper at compile time than at runtime. Also, finding an error at compile time is preventative. If you have to wait until the application is running before finding a bug, it might be months before you even try a potentially buggy code path. NHibernate requires you to create XML files that contain the details of the tables you are trying to access through objects. LINQ to Entities, on the otherhand, just requires you to drag and drop the objects onto the designer. LINQ to Entities also handles many-to-many relationships in a really easy to use way. Don’t get me wrong, NHibernate looks good, but LINQ to Entities looks like it could be even better!