Use Workstation Garbage Collection (GC)

March 27, 2007

On March 8, 2007, Jeff Stuckey, the Systems Engineer Manager from Microsoft gave a web cast on IIS and the Garbage Collector which was, well, pretty surprising really!

I have painstakingly transcribed what he said about the Garbage Collector and it’s performance in large, multi-application environments. Basically, the main jist of it is “Use Workstation GC” and secondly, beware of bad caching.

Here is a fragment of what he said:
“We’re now running into a problem where we have 11 worker processes. Each of them have server gc running. Each of them have 4 high-priority threads trying to do Garbage collection work. This is potentially dangerous because depending on timing of whengarbage collection happens you could really drag out the collections for one particular worker process if this gets interrupted in any way by any of the other high priority GC threads or any of the worker processes, so the guidance that we’ve been given is to go with what they call the workstation GC. This setting is in aspnet.config, as follows:

notepad c:\windows\\framework64\v2.0.50727\aspnet.cofig

<gcServer enabled=”false” />

Now what this does is that it converts the behaviour instead of having 4 dedicated GC threads running at high priority with their own segments, you have one GC thread and the allocation that it does for a native 64 bit machine is 256Meg with a 128Meg Large object heap. So if you have a lot of worker processes, 64-bit or 32-bit, doesn’t matter, if you have a lot of worker processes on a multi-processor machine, the guidance is to run this workstation GC. It reduces the overall footprint and the initial footprint of the CLR. I believe it also kind of starts up faster. On 32-bit machines the segments are even smaller, 16Meg Initial segment and 16 Meg Large object heap.

Another option we’ve played around with, a little bit undocumented is the GC segment sizeregistry key…
This basically configures the segment size for server or workstation but we have been mainly using it for server GC to configure the 64-bit machines down to the 32-bit machine segment sizes just to see how it impacts performance. We’ve had pretty positive results. So anyway, that’s just for your information…

[Later on he said…]

GC behaviour in production. This is what we see in our systems in production. ASP.NET caching generally drives your managed growth on, I don’t know, 90% of our applications, when we look at GC activity, and we look at over activity, the CPU utilisation because of GC, ASP.NET caching is generally the culprit, either caching items with no expiration policy, or their caching as much as they can, that’s our experience. Gen1 Ssize is typically very small compared to Gen 0 and Gen 2……..Cost of GC is driven by the number of objects that survive, not dead objects.

On x64 with many application pools the system can experience memory pressure which can be catastrophic in terms of GC activity. So basically what happens is that you have a lot of applications and they’re all caching and they’re driving the system down to the point where GC says “what’s the memory load” and the system says “ow, I’m this full” and the GC says, “ow, this is memory pressure, I better start collection”. All 11 app pools see this at the same time and they all start collecting, and when this happens, since the GC threads are running at high priority they can block, even, http.sys from taking connections so that the behaviour that we saw, in the availability tests was that it was failing on some requests to static gif files, which is just bizarre as you wouldn’t expect static gifs to have a connection failure or any kind of failure for that matter and it turned out that it was GC activity that was killing us.


XML is usually a large culprit because you end up with this XML document that has literally hundreds of objects that are linked to it, most of them strings, and people like to cache up XML for performance reasons. Caching’s not bad, I’m not saying that, but the guidance would be that you really get a handle on exactly what you’re caching and you cache it in an intelligent way so that you’re not overutilising the cache. You cache only stuff that’s hot, not stuff that’s only hit, you know, once or twice. The goal is to improve performance of your caching. So you don’t want to cache just about everything because you will eventually drive the system into memory pressure which has a very negative impact on performance.


Move SQL Server Execution Plans to the middle tier

March 27, 2007

Most Enterprise developers are familiar with n-tier development. Over the years there has been some argument as to where you put certain logic.

In n-tier development, there is usually a UI layer for layout, a UI-oriented Business layer for UI-oriented validation logic, a Data-oriented Business layer for business rules, and a Data layer. However, it’s not that simple.

You see, conventional Enterprise development says that business logic should be in the business layer(s), but then it turns around and says that if you need a particular piece of business logic to be more performant, then it should be re-written as a stored procedure.

Rewriting business logic in the database stored proc would be undoubtedly faster. However, the very idea that business logic is now turning up in the database is conceptually wrong.

Lets take it a step further. Say you want every piece of data-oriented business logic to be performant. Suddenly, every routine that previously existed in a the middle-tier business layer is now in the database. So that middle-tier now provides just a pass-through to the routines in the database.

This strategy actually works, and works well. I have seen a number of medium sized systems that are implemented this way. There are actually many benefits in coding this way. You can now write a patch script to change business logic that is transactional. To get an application upgraded generally requires more bureaucracy than writing a patch script for a few stored procedures. Again, this is not a catch-all, and I wouldn’t write every system this way. As always, there are many ways to skin a cat.

A major benefit of stored procedures, and why when they are written well they are so fast, is that when they are compiled, they produce an execution plan. The stored procedure knows which indexes to use and exactly where to get the data from. The downside is that not everyone has experience writing stored procedures, and maybe they shouldn’t have to.

What this all comes down to for me is this. Every developer that works in the data-oriented business layer should need to know at least SQL. Stored procedures themselves are just routines, and there should be a way to replicate this sort of functionality in the middle-tier.

At this point, the smart people at Microsoft should be able to work out a system to ensure that the execution plans are compiled in the middle tier based on the routines written there. No, I don’t know exactly how they’ll do it – Perhaps they’ll have to originally write some sort of polling/replication to the database to achieve it.

The best place to put this is probably in a hybrid of the new LINQ framework. If LINQ is as good as I think it should be, then it should be relatively easy for companies like Microsoft to plug in an optimised block for compiling execution plans in the middle tier.

If the outcome is better performing applications, then I’m all for it. And if every Enterprise application world-wide suddenly becomes faster and more efficient, with all the business logic in one location, and without developers having to learn anything new, then that is a great thing.

ASP.NET 2.0 unhandled exceptions tear down IIS worker process

March 26, 2007

According to Jeff Stuckey, Systems Engineer Manager for Microsoft, unhandled exceptions cause the whole IIS worker process to be torn down. This has serious implications, as if you’re running a whole lot of sites on IIS, you don’t just lose the App Pool for the one that causes the error, you lose the whole lot!

Don’t believe me? Check out the WebCast called “Debugging CLR Internals” on at about the 36:06 mark. The workaround is to implement legacy exception handling at