How to build a modern best practice enterprise application on-premise or in the cloud

May 24, 2012

This is a summary of the key points of my presentation at the Victoria.Net users group, held at Microsoft in Melbourne, Australia, with a few added points thrown in for extra value.

Developing large scale applications is hard, and never before have we been under so much pressure to be efficient with our time. It is imperitive that we spend as little time as we can on unproductive infrastructure code and as much of our time actually implementing the domain/business services logic.

For some reason, many enterprise architects promote leaving performance to the end of the development process. They decide on an architecture, get all their developers to build it, and then when it gets to the end of the development process, wonder why the application runs so slowly. They haven’t even done a basic performance test, to see if any of their assumptions were correct, and at the end of the process sometimes discover serious flaws in the architecture of their sites requiring considerable rework, and occasionally a complete rewrite is required.

In a previous post, I showed statistics produced by Watt’s Humphrey, known as the father of Software Engineering, where he shows the cost of discovering and rectifying defects in their respective project phases. The later you find defects in the development process, the more expensive they are to rectify, so if you wait to the end of a project before dealing with an issue that should have been rectified at the start of the project, it could cost you magnitudes more than if you at least have an attempt at dealing with performance from the start. You can find that post here.

Another issue with large scale web sites is configuration. Most configuration is found in web.config files or app.config files within a .net application. Within a web site, if you modify the web.config file, it resets the web server to refresh the configuration settings. One environment I worked in had a web farm that consisted of 8 web servers. Because of the use of web.config settings and also Enterprise Application Blocks, which heavily utilise configuration files, they were always running into problems with servers that have their configuration out-of-synch, and also have an unstable state during the update. And they still are.

So on Tuesday night, I showed a comprehensive end-to-end application that demonstrates an n-tier, service-oriented architecture that is considered best practice. It also allows on-premise, hybrid and cloud-based tiers. For people that are struggling with the whole “to cloud or not to cloud” arguments, this allows you to hedge your bets. If this point has been holding back project development, you don’t need to worry. This architecture allows you to build an on-premise application, then move it to the cloud at a later date, for very little extra cost, if there is a movement to do so.

The features are:

A configuration block. The configuration block is at the heart of the application. It enables you to centralise configuration for all the tiers of your application. It provides a web based portal for making modifications. When you make changes to configuration, those changes are automatically updated on the server.

Centralised exception handling and errors.

Centralised logging.

A service map. The service map shows a visual representation of all your tiers, where they are hosted, what instances are running, what databases you are running, and it even allows you to drill down into a database and see latency figures and the least performing and most CPU intensive queries. Using the service map, you are able to see any servers that become unavailable, for whatever reason. The architecture actually implements polling of all the component instances, so you’ll know pretty much straight away if any of the services aren’t available.

A large scale application project structure that you can use, separating User Interface, Business Services Logic and Database.

A database layer that is optimised for performance. This was at the heart of the performance tests against IBM WebSphere, so if there was a faster way of interacting with the database, I would be absolutely shocked. Microsoft has to be able to push as many transactions per second as possible through this system. They have to outperform IBM WebSphere, which they do dramatically. (IBM WebSphere is 6 times more expensive for the same grunt.)

A benchmarking tool that allows you to test your architectural choices. What happens with this architecture if you make the wrong choice? Nothing. The application has been designed so that you can change where your tiers are hosted, at runtime, on the fly! So you are protected against making the wrong choice.

I should point out that I demonstrated all of this. I started by running the application fully in-proc. Then I moved the business services layer to be IIS hosted. I was able to show this on the visual service map. Then I switched the application over to a business service layer hosted in the cloud. When I showed the service map, it looked awesome.

There are full instructions on how to set up the application. The installer only takes about 5 to 10 minutes to install everything. It is rich with documentation and it includes full instructions on how to configure Azure to have different tiers, how to install digital certificates, and ensure that all the communication channels are encrypted.

I mentioned that if you host a business services layer in Azure, then you can supply digital certificates to any company that you would like to use that service. So any company can integrate with the business services layer without you having to worry about them gaining access to your network.

I explained how the load balancing works. You can have as many servers you like in the web farm, and the requests are round-robined to each of the services.

I spoke about how to integrate the configuration block into your own application, and also demonstrated the Visual Studio template provided to generate the basic application layers, which include the configuration block.

The application itself was written by Gregory Leake. Gregory is the Technical Product Manager at Microsoft for the SQL Azure and Azure AppFabric teams. The application was first written around 2006, and is now at version 5. Thousands and thousands and thousands of hours have gone into thinking through the mass of scenarios that you might need to consider, and spent on coding this application. At the time of writing, this application has been evolving for 6 years. The application, including a working demonstration of the running Azure version, may be found here:

I guess my final message here is that in order to be more efficient in your coding, you need to simplify. Forget over complicated tricky designs. All that will happen is that they will be difficult to maintain. Spend as little time as you can on infrastructure code. If you can find a framework such as this that has all these goodies already provided, then embrace it. There is no way you can develop all this efficiently in short timeframes, and why would you want to anyway? It’s just reinventing the wheel. Take it on and use it within your environment. Then spend most of your time building your application, and not the infrastructure to support it.

Sql Server 2005 – Twelve Tips For Optimizing Query Performance by Tony Wright

May 26, 2008

1. Turn on the execution plan, and statistics

The first thing you need to do is to use the tools that help you determine whether a query done one way is better than another. That’s what we’re trying to do. By comparing the original query to a new query that we come up with is the best way to evaluate the benefits of any changes.

To do this, go into Sql Server Management Studio and select the Query menu. Select the “Include Actual Query Plan.” This turns on the graphical Execution Plan when you execute a query, and that can be found in the bottom pane after the execution.

In the Execution Plan, you can mouse over the components of the plan and it provides tooltip information boxes. The box contains Estimated Subtree Cost, which can be used to help determine whether one query is better than another. Of course, it’s not always right, as some query parts are not included in the execution plan, but it helps. It is also helpful to know the estimated number of rows, which is also found in this tooltip box.

Next, turn on statistics. Type the following statement:


This causes statistics to be output to the Messages tab in the bottom pane. The information you want here is mainly logical reads and physical reads. Logical reads are page reads from memory. Physical reads are page reads from disk. This stat can be a little deceptive as it doesn’t include CPU in the metric, but in general, the less page reads, the less work done, and so the more performant the query will be.

To counteract the above two you should also compare the actual execution times. To do this, eecute the following statement:


This also has issues, as blocking and contention issues affect the output time. You should execute the query a few times to determine how accurate the time shown is.

2. Use Clustered Indexes

Having the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performant type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.

The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:

ALTER TABLE MySchema.SalesOrderHeader
ALTER TABLE MySchema.SalesOrderHeader

Then the following statement adds a new clustered index to a table.

ON MySchema.SalesOrderHeader (OrderID)

3. Use Indexed Views

Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!

CREATE VIEW MySchema.SalesByCustomer
SELECT soh.SalesTerritoryID, soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID

Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:

ON MySchema.SalesByCustomer(
SalesTerritoryID, CustomerID

4. Use Covering Indexes

Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimised for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:

ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)

The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, then the query optimiser doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimiser doesn’t need to query the original table, performance is improved.

5. Keep your clustered index small.

One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.

6. Avoid cursors

A bit of a no-brainer. Cursors are less performant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimiser can’t optimise a CURSOR statement, instead optimising the queries within each execution of the cursor loop, which is undesireable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided. There is a case where it is useful to use cursors, however, and that is when you actually want a row by row execution. In the case where there’s blocking occurring in a large batch of rows that’s exactly what you want.

7. Archive old data

Another no-brainer, so I won’t say much. If you want to improve query performance, give the optimiser less work to do. If you can cut down the number of rows the query has deal with, then performance will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.

8. Partition your data correctly

These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimiser can use the partition function to look at rows only on the most appropriate filegroup. To create partitions, you need a partition function and a partition scheme.

CREATE PARTITION FUNCTION myRangePartitionFunction(int)

Once the partition function is created, you can then apply the function to a partition scheme for a table.

AS PARTITION myRangePartitionFunction
TO (filegrp1, filegrp2, filegrp3, filegrp4)

Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:

CREATE TABLE mySchema.myPartitionTable(
col1 int,
col2 nvarchar(100)
ON myRangePartitionScheme(col1)

9. Remove user-defined inline scalar functions

Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimised into the query plan by the query optimiser. Bad news, because it turns a Seek into a Scan. Queries like this may appear to be performant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really really badly. No seriously, really bad.

Here’s an example of what I’m talking about:

CREATE FUNCTION dbo.fnGetPostalCode(
   @Suburb nvarchar(100),
   @State nvarchar(10)
      SELECT PostalCode
      FROM dbo.PostalCode
      WHERE Suburb = @Suburb
      AND State = @State
   ), -1 );

The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.

SELECT s.SalesPersonID,
AS PostalCode
FROM dbo.SalesPerson

You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”

One way to get around this is to simply inline the underlying query from the function, as follows:

SELECT s.SalesPersonID, s.SuburbName, s.State,
         FROM dbo.PostalCode
         WHERE Suburb = s.SuburbName
         AND State = s.State), -1)
      AS PostalCode
FROM dbo.SalesPerson

Inline the SQL statement will perform significantly better.

10. Use APPLY

The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:

SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
          FROM Sales.SalesOrderDetails
          WHERE SalesOrderID = soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
           FROM Sales.SalesOrderDetails
           WHERE SalesOrderID = soh.SalesOrderID)
           FROM Sales.SalesOrderHeader soh)

This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:

SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
   SELECT TOP (1) sod.UnitPrice, sod.Quantity
   FROM Sales.SalesOrderDetail sod
   WHERE sod.SalesOrderId = soh.SalesOrderId
   ORDER BY sod.Quantity DESC
) as a

11. Use computed columns

Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate SalesPrice and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called SalesPrice, then index it, and the query optimiser will no longer need to retrieve the UnitPrice and Quantity data and do a calculation – it’s already done.

12. Use the correct transaction isolation level

If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.

For example, READ UNCOMMITTED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occurring that require atomicity. This is the most performant and it ignores locking altogether, but is generally not allowed by good design and is a special case.

With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transactionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.

There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.