How much can I save with Cloud Computing?

January 27, 2012

Well, that depends. How much of your environment are you intending to put in the cloud? Are you intending to just put a toe in the water, or are you going to put everything in?

If you’re talking about moving all your servers into the cloud, Microsoft produced a whitepaper in November 2010 that showed the Economics of going cloud. That whitepaper is found here: Microsoft Economics Of The Cloud

To keep it simple, in the article, on page 10 there is a graph. It shows the Total Cost of Ownership based on how many servers you have in the cloud. It starts at 100 servers, and as the number of servers increases, the costs drop. The sweet point is around 1000 servers, where the real benefits reach their peak. After that, the costs do still reduce, but not at the rate up to the 1000 server point.

For a mid-sized company, I would expect to see between 20% and 30% reduction in costs. I recently had a conversation with a company that has 40,000 employees that have now put everything in the cloud. They claimed a 25% reduction, and I would say they are pretty conservative with their calculations.

The next issue is private vs. public cloud. Of course, there are many companies that can’t put their data in the public cloud for a variety of reasons. Many organisations can’t share or even be seen to be sharing their data with other companies. Organisations with privacy issues, such as childcare providers or elderly facilities, government, military. There are organisations that have regulatory issues relocating data to offshore data centres. The public cloud for Australia, for example, is located in Singapore.  There are organisations you don’t want to hear have gone to the public cloud. This is in spite of the fact that the public cloud is architected with the utmost focus on security and separation of tenants and their data. These people need to choose a private cloud.

A private cloud can be hosted in house, or in a local data centre and is generally set up for you by a larger organisation, such as Fujitsu. The equipment is sourced from the same batch of computers that would be found in a public cloud set up, but you don’t share any machines with other organisations. You are the only tenant. Whoever you get to implement your private cloud will also manage your service, so they can have employees that manage significantly more servers than your own infrastructure employees can manage, so you get that economy of scale. The only problem with private cloud is that it costs, because you are the only tenant.

If you look at the Microsoft Economics of the Cloud whitepaper, on page 15, they have a graph that shows the relative costs of private cloud vs. public cloud. For smaller organisations, such as those with 100 servers, the cost of private cloud is more than 40 times the cost of utilising public servers. Of course, the cost of private cloud drops as you add on more servers. Looking at the graph, by the time you get to 1000 servers, the cost of private cloud is 10 times the cost of the public cloud.

But you really need to ask yourself, do I really need private cloud? Security in the public cloud is extremely robust. The architecture prohibits data crossing the tenant boundary. Microsoft and other organisations know that if they have security breaches then they are virtually finished, as Amazon found recently. And Microsoft must be the most attacked company in history. So if they send their security specialists in to architect a system where companies may want to put their most secure trade secrets, they have to make sure it is secure. There are large multinational companies with their entire operations in the public cloud. I’ve met one of them. They wouldn’t have done it if they had real concerns about the safety of their data.

 


What is Cloud Computing? Is it hype or will it give me real benefits?

January 24, 2012

There is still a lot of misunderstanding about what the cloud is. People think that having their servers hosted in a data centre is the same as having it in the cloud, so they can’t visualise the change. The cloud is about providing economies of scale, and also it’s about bulk administration of servers and services provided.

Firstly, there are cloud data centres. The cloud providers go out and get the system architected so that they can then mass produce a single server type for their data centres. Then they bulk produce it. The data centres are absolutely massive. The interesting thing is that for them to add another server in, it costs in the order of $10. (Feel free to jump in with corrected figures if you have them. The figure I was originally told was actually less) That means that they can pass on some of the savings to you.

Secondly, there are the cloud applications. Hosted applications in the cloud. This means that instead of your staff having to go buy and build a server, install the software, and configure it, you can simply put in a request to have it provisioned and a new instance will be created and available either instantaneously are within a very short period of time. You no longer need staff that are skilled in building and configuring hardware. You just provision another application and it’s up and running in a timely manner.

Of course, if you’ve already got all the hardware and got all the licences, you’re quite possibly not going to consider moving to the cloud. Therefore many people will wait around until their hardware or O/S is virtually obsolete. This will perhaps cause them to miss an opportunity to reduce or retask headcount.

That’s because the cloud providers will be providing all the hardware and provisioning of servers. They have teams that can administer 1000 servers per person. Within your own company, you’d be lucky if you had one person that could administer 100 servers. So instead of employing people to manage all that infrastructure, you need less people.

Now, I’ll use a lot of figures here to try to make a point. It’s hypothetical. In fact, I’ll say they’re all made up so I don’t have to get into arguments about them. (If you believe you have accurate references to the true costs, by all means, put them forward!)

To a small business, whether you spend $200 a year or $300 a year on a server, there’s not really much difference. But consider larger organisations that have 1000 servers. The cost difference in that case becomes $200,000 vs $300,000.

In the cloud case, that server cost might drop to $100 per server per year.

Say a person with that skillset was costing you, say, $150,000 a year (not what they’re earning, their cost to the business), and you now needed 3 less staff members because of it. The saving from that is $390,000 a year. Add the $200,000 saving from server cloud and that starts looking like a more interesting $590,000.

So when companies consider the cloud and they scoff at moving to the latest “fad” because of their existing environments you think, yea ok. But then consider new companies, fast movers, up and coming. They aren’t going to care about all these issues bigger companies are having regarding corporate governance and data safety, because the cloud offering will be an adequate proposition for them. Data safety and security will be good enough. Then when it comes for these companies to scale, they can just spin up new servers. Old school companies will move much slower and the new cloud businesses will become the rule breakers that will catch up faster than ever before.


Windows 8 is revolutionary!

September 14, 2011

Windows 8 is going to change the way we interact with our computers. If you ever watched Minority Report, you will remember the way that Tom Cruise interacted with the platform. You won’t be wearing the glasses and interacting with holographic images with virtual reality gloves, but it offers just about everything else that’s practical today.

Windows 8 Start Page

Windows 8 brings touch to desktops in a big way. You will have your favourite apps on the main screen, tap a tile to run the app, slide across to a list of other all apps, bring up your screen that has all your photos and slide them around. Basically, it brings much of the cool functionality available on your smart phones to the desktop.

As far as development is concerned, people will need to start thinking about how to make their apps look really cool and conform with this environment. If your app looks clunky, someone else is going to create a new version with the new user interface style that will walk all over it.


SQL Server: Random Numbers on a Row by Row basis

March 9, 2011

If you use the standard RAND() function call in SQL, it will execute once for a column in a batch, and that same value will be used for every row in your SQL Query. It is possible to get around this problem.

Everyone knows that NEWID() will generate a new guid for every row in the database. If you pass a NEWID() into the CHECKSUM function, it will return an int value (can be positive or negative) between min and max int.

To return a value equivalent to the RAND() function, I simply execute the following:

SELECT ABS(CHECKSUM(NEWID())) / 2147483647.0

This will return a positive value between 0 and 1. Note that the decimal point in the max int figure is important here because it converts it to a floating type – otherwise the result would be truncated and always be zero.

So all you need to do if you want to return a set of random values, say, between 0 and 100 would be to multiply it by 100 and subtract 1. The cast to int simply truncates the result, so that we don’t have decimal places.

SELECT Id, CAST(ABS(CHECKSUM(NEWID())) / 2147483647.0 * 100 - 1 as int) AS RandomNum
FROM MyTableContainingLotsOfRows


SQL Server: Generating SQL For Missing Foreign Key Indexes

March 9, 2011

One way to improve performance in SQL Server is simply to ensure that there are indexes on all foreign key.  This is NOT done automatically, which is kind of surprising given how much of an impact such a change would have on the performance of a significant install base of SQL Server.

I have a simple stand-alone SQL Script that I execute to generate the SQL for the missing foreign key indexes. Here it is:

SELECT DISTINCT 'CREATE INDEX IX_' + REPLACE(objconstraint.name,'.','_') + ' ON ' + fkcols.name + '.' + objfk.name + '(' +
	STUFF((	select ',' + col.name
			from sys.foreign_key_columns fkcol2
			join sys.columns col
			on fkcol2.parent_object_id = col.object_id
			and fkcol2.parent_column_id = col.column_id
			and fkcol2.constraint_object_id=fkcol.constraint_object_id),1,1,'') + ');'
FROM sys.foreign_key_columns fkcol
	JOIN sys.objects objfk ON fkcol.parent_object_id = objfk.object_id
JOIN sys.schemas as fkcols ON fkcols.schema_id = objfk.schema_id
	JOIN sys.objects objconstraint ON fkcol.constraint_object_id = objconstraint.object_id
LEFT JOIN sys.index_columns ic ON ic.object_id = fkcol.parent_object_id
	AND ic.column_id = fkcol.parent_column_id
AND ic.index_column_id = fkcol.constraint_column_id
WHERE ic.object_id IS NULL

This script outputs the results in the results grid, so you can cut and paste it into a new query window for execution.

I was asked why I didn’t use INFORMATION_SCHEMA views in this bit of SQL. The answer is simple – the standard for INFORMATION_SCHEMA doesn’t include Indexes. Enjoy!


Silverlight DataContextProxy no longer necessary

January 25, 2011

Prior to Silverlight 4, to get access to the Page/Control DataContext from within a grid, it required some trickery to be able to bind elements such as Combobox ItemsSource collections within a grid. Probably the most used idea was to create a DataContextProxy class, and bind to that when you wanted to access the DataContext within a page. Well, this is no longer necessary, so you can remove all references to DataContextProxy classes, and do the following instead.

Note here that I am assuming that you have bound your View Model to the current Page/Control.

Firstly, you need to name your root element. In my case, it’s a navigation page, so I have named it “ThisPage”

<navigation:Page x:Class="MyNamespace.MyClass"
           x:Name="ThisPage"
           Title="My Title"
           xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
           xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
           xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
           xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
           mc:Ignorable="d"
           xmlns:navigation="clr-namespace:System.Windows.Controls;assembly=System.Windows.Controls.Navigation"
           d:DesignWidth="1085" d:DesignHeight="500"
/>

Of course, I have the data context of this page is bound to my view model (you can do this various ways, even in the XAML)

public MyClass()
{
   InitializeComponent();

   if (viewModel == null)
         viewModel = new MyViewModel();
   this.DataContext = viewModel;
}

Within a grid, I can now bind directly to the DataContext property of ThisPage.

Here I have shown how to bind a ComboBox collection inside an sdk combo box column.

<sdk:DataGridTemplateColumn x:Name="MyListColumn" Header="My Type" Width="150">
   <sdk:DataGridTemplateColumn.CellTemplate>
      <DataTemplate>
         <ComboBox x:Name="MyListComboBox" DisplayMemberPath="Description" SelectedValuePath="Code" SelectedValue="{Binding Path=MyCode}" ItemsSource="{Binding Path=DataContext.MyObservableCollectionInViewModel, ElementName=ThisPage}" />
      </DataTemplate>
</sdk:DataGridTemplateColumn.CellTemplate>

Note that the ItemsSource is bound to an element, and that element is ThisPage. ThisPage has a DataContext, which is bound to my View Model. So to bind the observable collection I created, called MyObservableCollectionInViewModel, to the ItemsSource of the ComboBox, I need to reference the collection off the DataContext property of ThisPage. Hence the following syntax for the bind itself:

ItemsSource="{Binding Path=DataContext.MyObservableCollectionInViewModel, ElementName=ThisPage}"


Curing Cancer

December 17, 2010

My brother is Gavin Wright, a thoracic surgeon, an Associate Professor, and the Director of Oncology (that’s cancer) at St Vincent’s Hospital here in Melbourne.  This week they’ve announced a significant breakthrough in the treatment of Lung Cancer. They’ve analysed 1000 tumors over the last 8 years and found a common gene in about 1/5 of the cases. It is expected that the gene will be able to be targetted by tablets, rather than aggressive chemotherapy. This find could improve the survival rates for thousands of people around the world. See the article here: http://www.theage.com.au/victoria/doctors-make-cancer-advance-20101216-18zll.html

See the video here: http://media.theage.com.au/national/selections/breakthrough-in-lung-cancer-treatment-2099455.html

If you would like a chance to donate to a really good cause, where the money will definitely make it into the hands of productive researchers that actually make a difference, you can write a cheque out to St Vincent’s Foundation with a request to be donated to the Clinical Lung Cancer Genome Project, or go to http://www.stvfoundation.com.au/, click on the Donate Here button, and put a comment in that you request the donation to end up in the hands of the Clinical Lung Cancer Genome Project.


Camp Australia has gone live!

December 16, 2010

The team at Hazaa have been working hard to build a new web site for Camp Australia, a major provider of childcare services in Australia. Today it went live for the first time, and we’re particularly proud of it. Take a look at the site yourself here: Camp Australia

Hazaa are a Microsoft Provider that consists predominantly of developers that have come together from various Microsoft Gold Partners. We’re experts in our respective fields. There are Silverlight, ASP.Net, Cloud Computing, Sharepoint and SQL Server experts, and they are a great bunch of people that I have the privilege of working with.

Camp Australia Home Page

SQL Server 2008 – Fourteen tips for optimising query performance for applications by Tony Wright

June 23, 2010

1. Aim for the ideal of only one single sql call per page request.

Firstly, I should explain what I mean by page request. When you open a page, the page may need to be populated with content from the database. That call to the database to populate that page is a page request. If that page is a data entry page and required data to be written back to the database, then clicking on the Update button will also perform a page request. So it’s impractical to say one single database request per page, because there are likely to be many, especially on a more sophisticated page.

Now, if you make multiple requests to the database, that tends to require each request to be authenticated, executed, then the results returned. I have seen applications where there are literally hundreds of database calls just to update a single grid. I have seen where every row of the grid forces a call to the database to retrieve the individual row, that row gets updates, then a call is made back to the database to commit that data. That’s a lot of network traffic to achieve what could ultimately be done in a single batch!

If you must touch the database, prefer to use a stored procedure call that takes all the parameters you need, including user information, and returns all the data required to populate the page. This has become less popular in recent times because applications are becoming more modular (web parts) and technologies such as LINQ to SQL don’t easily return multiple resultsets in a single call. Modern databases such as SQL Server 2008 do batch inserts, updates and deletes, but they can’t really do a good job of batching if you are selecting individual items, modifying the data then writing it back to the database row by row.

So, does every one of my applications have a maximum of one database request per page request? Of course not! It’s an ideal. I often don’t have the time to go making every page perform optimally from the start, so my pages usually average around 2 or 3 requests per page request. Practically speaking, I prefer to use LINQ to SQL to make stored procedure calls and only return single resultsets. PLINQ, which will give us the ability to make multiple database requests in parallel, will improve this somewhat. Unfortunately, it is unlikely to fix the scenario where each row of a grid is retrieved, updated and written back individually. It might speed this scenario up overall, however it is still far less efficient than a single call!

There are also scenarios where page content won’t change, or it changes infrequently. These days, you could potentially load an entire table into cache and perform LINQ queries to retrieve the data from there. It is far harder to debug applications where data is stored in cache, but it can be done. By doing this, you can get the total number of queries per page down significantly below one request per page on average. The work is then performed within your web server, which is a cheaper resource.  By doing this, you an reduce the impact on the database, and you’ll potentially be able to scale your site up as sql server itself will only need to service the more important transactions. It is far easier and more cost efficient to scale up by adding extra web servers than it is to have to scale up your database and be able to handle a much higher volume of traffic.

2. Turn on SQL Profiler

This one’s easy, because most good developers already do it. If you turn on the profiler and execute your page, you should be able to see all the traffic caused by a single request. This is important, because you will be able to see a number of things. Firstly, you can look for candidates for queries that can be combined. You may also see situations that shouldn’t need to be done each time you request the page. You may also see situations where the exact same query is executed multiple times.

This case is a special one, because within an ASP.Net page, the event model can sometimes cause double execution of queries. This is usually done to recreate an old version of the page which can then be used to process post backs, and then a second query is executed as a result of the post back to change the state of the page. This can be pretty nasty depending on what you are doing. For example, if you were changing pages in the grid, you wouldn’t want it to query the old page just to process the request for the next page. Page.IsPostback is intended to prevent this kind of scenario, but it doesn’t always work out like this, especially if you have controls that were dynamically added to the page. In this scenario, there would be twice as much work done to retrieve a single page, as the database would potentially need to sort the data twice, run the rownumber routine against every candidate row, an so on. Of course, it has some smarts to reduce the amount of processing, but it still takes more time, and it still retrieves twice the data, which needs to be passed across the network, and rendered to the page possibly twice.

The profiler can also help you determine where to look in your code for problems. On numerous occasions I have identified code that I wasn’t expecting to be executed just by seeing an unexpected query via SQL Profiler.

And finally, the profiler can help you determine which query in a batch is taking the longest.

3. Preprocess expensive queries

One of my customers requires a hierarchy consisting of every company corporate structure tree in Australia. Some of those companies have holdings in around 40 to 50 thousand companies. If these were created on the fly, and a few of their customers requested these corporate structures at the same time, the impact on the server could be significant. What I did here is to stage the data in fixed temporary tables and then populate a table with the hierarchies already calculated and in the correct order. The time taken to retrieve the corporate structure, then, is the same as simply retrieving a user interface page of rows. This processing occured overnight, as it is acceptable for the data to be out of date by one day.

4. Review query plans to ensure optimally performing sql statements.

People have different ways of analysing and improving queries that they need to execute. One I like to analyse is the query plan. Now, the hardcore people look at the text output of the query plan, however I prefer the graphical view. From SQL Server Management Studio, select Query from the menu, then choose Show Actual Execution Plan. The next time you execute the query or stored procedure, it gives a graphical representation of the query execution in one of the tabs adjacent to the results and messages tab. The rule of thumb is to look at the relative expense of each subsection of the query and see if you can improve the performance of the more expensive parts. You work from top to bottom and right to left and aim to replace to icons (which represent underying query choices) with more efficent ones.

5. Identify table scans or index scans

Make sure your tables are using indexes. Table scans and index scans are candidates for adding or improving indexes you have. On very small tables, using a scan often doesn’t really matter. Also, when using an “exists” test, the query processing engine may decide that it’s more efficient just to perform a table scan of a table rather than use an index. This is perfectly acceptable. However, on larger tables with more rows, indexes are necessary and more efficient devices. 

Each table has a maximum of one clustered index, and the rest are non-clustered. With the clustered index, the physical data within the table is actually organised in the order of the index. So it is the most efficient if it is used by the query optimiser. With the non-clustered indexes, a separate structure is kept within the database that contains the data ordered in the order of that index. So if you have a lot of these indexes, a significant amount of extra processing may be required to keep these indexes up to date. However, these indexes are usually smaller than processing the original table, and so can significantly improve the performance of a query. So if data entry speed and disk space is not a big issue, lots of indexes is fine and a good choice.

6. Identify key lookups

Within the query plan, identify Key Lookups (these used to be called Bookmarks). If there are any Key Lookups, they can be removed from the query plan by adding columns as include columns to the non-clustered index on the right of the Key Lookup. Include columns are fantastic, because they ultimately mean that when the candidate rows are found in the index, there will be no requirement to go back to the original table and retrieve the included columns because they are already in the index. So in this case, the speed of the query approaches that of a clustered index. Don’t worry, the data is kept the same between the include column and the original table, so you lose nothing here. The only downside is that included columns are added to the index and so index size increases, and data entry may slow down slightly because there are more references to maintain.

7. Check the order of columns in your where clause

Ensure the order of columns in your “where” clause is the same as in the order within your index, otherwise it may not choose your index. I’ve seen plenty of cases where scans are performed instead of seeks simply because the order of the columns in the where clause are not optimal.

8. Ensure the where clause is ordered most restrictive to least restrictive.

This will make sure that the most efficient path is taken when matching data between indexes in your query. By restrictive, I mean that the data is more uniquely selectable. So a column with different data in every row is more restrictive than a column with much of the same data in every row. Also consider the size of the table, so that a table with less data in it may be selected first in a join over a table with more data in it. This can be a bit of a balancing act.

9. Remove CTEs

CTEs are kind of like temporary tables. But they are disasterously inefficient. I had one query taking 20 seconds to execute, which was unacceptable. It had a significant number of CTEs so I changed them all the Temp Tables and executed the query. The query sped up and took around 1 second to execute – which was fast enough for the scenario and so I could move on. There is one valid purpose I can think of for using CTEs and that’s for producing hierarchies. But in general, they can be replaced by temporary tables anyway. That said, however…

10. Remove Temp Tables

The creation of temp tables adds to the overhead required to run your overall query. In some scenarios, I have removed temporary tables and replaced them with fixed tables and had significant performance improvement.

If the temp table is created to enable to merging of data from similar data sources, then prefer a union instead. Unions, in general, are far far cheaper than temp tables. 

#Temp tables are created in tempdb. @Temp tables are created in memory first, but the moment there is memory pressure, they spill over into tempdb as well. tempdb requires disk writes and reads, and so will be slower than accessing the original table data.

11. Remove Cursors

These are one of the most expensive statements you can use. There are special cases where they should be used, but it’s better to train yourself to use standard set based statements than cursors. If you want to understand when it’s reasonable to use a cursor, take a look at Greg Low’s article. But in general, avoid them like you would hte plague. http://msmvps.com/blogs/greglow/archive/2007/02/04/improving-asp-net-session-state-database-performance-by-reducing-blocking.aspx 

12. Reduce the number of joins in your queries

If you significantly reduce the number of joins in your query, you will have a vast improvement in speed of the query. There are a couple of ways to do this. You could stage the data in a denormalised table, or in Enterprise edition you can create a view and put an index on that view. Again, it depends on how immediate your requirement is for having the latest data. It is often acceptable for reporting to build these tables overnight due to the fact that a single day often has no impact on the benefits associated with a particular report.

13. Remove all declared variables

If everything is compilable within your stored proc, then there will be no need for the query engine to perform extra work to determine how the query will look after the declared variables are taken into consideration. So it will perform most optimally. So how do you remove declared variables? Well, for starters, you can pass them as parameters of your stored procedure call. If you find you do need to have declared variables, you can create a second stored procedure that does the main work and pass the declared variables into that. So the passed variables become parameters in the second query.

14. Give this fact sheet to your database developers

http://www.dotnet4all.com/snippets/factsheet%20SQL%20Server.pdf

It’s a little bit out of date, but it’s the best one I have found so far. It shows the relative expensiveness of different data types, sargability of joins, includes various helpful information, and also describes many of the icons found in the query plans. Someone should tell that guy to update it.


How to switch rows and columns in SQL Server 2008 by pivotting and unpivotting

June 19, 2010

I came across a scenario in which I needed to switch rows and columns in the results of a query. That’s basically because I need the data to be fed into a chart in one form, and grid in another form. 

So here’s how the results looked before I ran my queries. 

Before Pivot

Before query

And here’s the result after I switched the rows and columns using my queries. 

After queries

After executing sql queries to switch rows and columns

Now, I haven’t really replaced the column headers with the DateBucket field, but that’s ok in my case, because I already know what the column headers should be based on the parameters I passed to my query. 

Firstly, the test data I created is as follows: 

 

CREATE TABLE #test
(
 LineNum bigint,
 DateBucket varchar(8),
 TotalOrders float,
 TotalItems float,
 TotalItem1 float,
 TotalItem2 float,
 TotalItem3 float,
 AverageCost float,
 AveragePrice float,
 AverageProfit float
);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (1, 'Jun 2010', 100.0, 220.0, 53.0, 72.0, 99.0, 1204.0, 2468.0, 1122.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (2, 'Jul 2010', 120.0, 300.0, 54.0, 73.0, 98.0, 1208.0, 2470.0, 1123.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (3, 'Aug 2010', 130.0, 280.0, 55.0, 74.0, 97.0, 1212.0, 2472.0, 1124.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (4, 'Sep 2010', 140.0, 190.0, 56.0, 75.0, 96.0, 1216.0, 2474.0, 1125.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (5, 'Oct 2010', 150.0, 250.0, 57.0, 76.0, 95.0, 1220.0, 2476.0, 1126.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (6, 'Nov 2010', 160.0, 260.0, 58.0, 77.0, 94.0, 1224.0, 2478.0, 1127.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (7, 'Dec 2010', 170.0, 200.0, 59.0, 78.0, 93.0, 1228.0, 2480.0, 1128.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (8, 'Jan 2011', 180.0, 210.0, 60.0, 79.0, 92.0, 1232.0, 2482.0, 1129.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (9, 'Feb 2011', 190.0, 230.0, 61.0, 80.0, 91.0, 1236.0, 2484.0, 1130.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (10, 'Mar 2011', 200.0, 270.0, 62.0, 81.0, 90.0, 1240.0, 2486.0, 1131.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (11, 'Apr 2011', 210.0, 240.0, 63.0, 82.0, 89.0, 1244.0, 2488.0, 1132.0);
INSERT INTO #test(LineNum, DateBucket, TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
VALUES (12, 'May 2011', 220.0, 290.0, 64.0, 83.0, 88.0, 1248.0, 2490.0, 1133.0);

Now the first thing I need to do is to Unpivot the data. This will give me a list of DateBucket, ColumnHeaders and Values. Not exactly what I want, but it’ll do for now.  I will store this in a temp table so that I can retrieve it for re-pivotting later.  

 
CREATE TABLE #TestUnpivot
(
DateBucket nvarchar(8),
ItemDescription nvarchar(100),
ItemValue float
);

INSERT INTO #TestUnpivot(DateBucket, ItemDescription, ItemValue)
SELECT DateBucket,  ItemDescription,  ItemValue
FROM
(SELECT DateBucket, 
TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit
FROM #Test) p
UNPIVOT
(ItemValue FOR ItemDescription IN
(TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit)
) AS unpvt
<pre>

This outputs the following results (There’s more data, but I clipped it for brevity): 

After the unpivot

After the unpivot

Now I want the ItemDescription field to be data in the rows and I want the months to be columns. So I need to re-pivot the data. To re-pivot the data, I execute the following query: 

 
SELECT ItemDescription as ItemValue,
[1] as Month1, [2] as Month2, [3] as Month3, [4] as Month4, [5] as Month5, [6] as Month6, [7] as Month7, [8] as Month8, [9] as Month9, [10] as Month10, [11] as Month11, [12] as Month12
FROM
(SELECT v.ItemDescription, v.ItemValue, b.id
FROM #TestUnpivot v
join (
select distinct top 100 PERCENT linenum as id, DateBucket
from #test
order by linenum) b on v.DateBucket=b.datebucket
) AS SourceTable
PIVOT
(
MIN(ItemValue)
FOR id IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
) AS PivotTable

Note that if you really really wanted to rename the columns, you could do that by using a dynamic query (sp_executesql) but avoid that if you can.Also note that I selected MIN for the aggregate function. This is intentional, because I feel that it requires the least amount of processing. Use of average would require the processing to keep a tally of all the results for later averaging, which I believe would add to the query time, maybe not much, but it would be more than MIN I think.

And finally, the grouping is done via the nested query:

 

SELECT DISTINCT TOP 100 PERCENT linenum as id, DateBucket
from #test
ORDER BY linenum

This groups the values by the month names found in the DateBucket column.So after executing this query, I get the result I wanted:

After queries

After executing sql queries to switch rows and columns

Merging all this into a single query, and I get the following:


SELECT ItemDescription as ItemValue,
 [1] as Month1, [2] as Month2, [3] as Month3, [4] as Month4, [5] as Month5, [6] as Month6, [7] as  Month7, [8] as Month8, [9] as Month9, [10] as Month10, [11] as Month11, [12] as Month12
FROM
(SELECT v.ItemDescription, v.ItemValue, b.id
FROM
 (
 SELECT DateBucket,  ItemDescription,  ItemValue 
 FROM (SELECT DateBucket,  
  TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit 
  FROM #Test) p 
 UNPIVOT 
 (ItemValue FOR ItemDescription IN
  (TotalOrders, TotalItems, TotalItem1, TotalItem2, TotalItem3, AverageCost, AveragePrice, AverageProfit) 
 ) AS unpvt 
) as v
JOIN (
 select distinct top 100 percent linenum as id, DateBucket
 from #Test
 order by linenum) b on v.DateBucket=b.datebucket
) AS SourceTable
PIVOT
(
 MIN(ItemValue)
 FOR id IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
) AS PivotTable


Follow

Get every new post delivered to your Inbox.