digiguru's blog

Friday, September 03, 2010

How to Pivot a survey table in SQL Server and return percentages in the columns

I came across a problem earlier today that had me stuck for a whole lunchbreak.

We send out yearly surveys, and our growing number of members choose weather to complete the survey. Now that we have historic data, I wanted to compare all the data of the years in one simple database query. I immediately thought of Sql Server 2005's Pivot command.

First - I will set up a temporary table that simulates a flattened view of a couple of fake questions in the survey. The actual tables are very denormalised, but this represents just the data I was trying to analyse.

DECLARE @Survey TABLE (
  surveyname VARCHAR(50),
  username   VARCHAR(50),
  question   VARCHAR(255),
  answer     VARCHAR(50))

INSERT INTO @Survey
            (surveyname,
             username,
             question,
             answer)
SELECT '2008',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2008',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2008',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2008',
       'Susan',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2009',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2009',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2009',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2009',
       'Susan',
       'Where do you live?',
       'Wales'
UNION ALL
SELECT '2009',
       'Joe',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2009',
       'Joe',
       'Where do you live?',
       'Scotland'
UNION ALL
SELECT '2010',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2010',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2010',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2010',
       'Susan',
       'Where do you live?',
       'Wales'
UNION ALL
SELECT '2010',
       'Joe',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2010',
       'Joe',
       'Where do you live?',
       'Scotland'
UNION ALL
SELECT '2010',
       'Hillary',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2010',
       'Hillary',
       'Where do you live?',
       'Scotland'

Selecting from that table shows all the data

SELECT *

FROM @survey

Now we want to compare how the total number of answers for each survey have changed over the years.

SELECT *
FROM   (SELECT username,
               surveyname,
               question,
               answer
        FROM   @survey mainsurvey) AS answers PIVOT (COUNT(username) FOR
       surveyname IN
       ([2008], [2009], [2010])) AS pivoted
ORDER  BY question

Output is shown below.

Question	Answer	2008	2009	2010
What sex are you?	Female	1	1	2
What sex are you?	Male	1	2	2
Where do you live?	England	2	1	1
Where do you live?	Scotland	0	1	2
Where do you live?	Wales	0	1	1

This form of data really hard to read. It's hard to see from the table how many people have answered each question. It would be far better to use percentages for each row.

I thought this would be easy - just put the percentage calculation in the pivot clause. I attempted to do this and just received syntax errors. After reading the spec I realised the only valid statement before the "for" tag is an aggregate function. On by adding DLLs to sql server can you add an aggregate, and I was losing confidence in the ability to display percentages. I took this opportunity to go on lunch!

While eating it came to me. You can change the value going into the pivot to a fraction of a percentage for each survey question. That way you can sum the values and create the final percentage values.

Full code sits below...

SELECT *
FROM   (SELECT surveyname,
               weight = 100.0 / (SELECT COUNT(DISTINCT username)
                                 FROM   @survey total
                                 WHERE  total.surveyname =
                                mainsurvey.surveyname),
               question,
               answer
        FROM   @survey mainsurvey) AS answers PIVOT (SUM(weight) FOR surveyname
       IN (
       [2008], [2009], [2010])) AS pivoted
ORDER  BY question

the result has nice percentages in it!

Question	Answer	2008	2009	2010
What sex are you?	Female	50.000000000000	33.333333333333	50.000000000000
What sex are you?	Male	50.000000000000	66.666666666666	50.000000000000
Where do you live?	England	100.000000000000	33.333333333333	25.000000000000
Where do you live?	Scotland	NULL	33.333333333333	50.000000000000
Where do you live?	Wales	NULL	33.333333333333	25.000000000000

Thursday, August 05, 2010

Managing multiple CSS and javascript files, and reduce your http requests

We are always looking at optimizing performance in our website.

we have loads of features through the site, but we like to program the work modularly. As a result a single page can have up to 50 different CSS and javascript files. Each of them are very small, but serve a specific function.

An example might be as follows...



<link href="/Css/reset.css" rel="stylesheet" type="text/css" /> <!-- We want to style nicely -->
<link href="/Css/default.css" rel="stylesheet" type="text/css" /> <!-- Our sitewide font sizes and colours -->
<link href="/Css/DropdownLists.css" rel="stylesheet" type="text/css" /> <!-- We have dropdown list navigation done in CSS -->
<link href="/Css/footer.css" rel="stylesheet" type="text/css" /> <!-- The footer is quite a large chunk that we style nicely -->
<link href="/Css/signinRibbon.css" rel="stylesheet" type="text/css" /> <!-- This is above the header -->
<link href="/Css/newswide.css" rel="stylesheet" type="text/css" /> <!-- This is a page that uses our News panel -->
<link href="/Css/buttons.css" rel="stylesheet" type="text/css" /> <!-- We have buttons on the page, so include our nice button styles -->

This is 7 stylesheet requests before we even get started on the actual content in the page!

Rule number one of ySlow! suggests you minimize the number of HTTP requests. They show that most users visits to your site are from users with an empty cache.

We decided we needed a way to minimize the number of requests per page.

I don't have time to waste on creating a massive solution and a quick browse on the internet yeilded nothing exicting that my Boss wanted to pay for, so I quickly wrote this:

/CSS/MasterHomepage.aspx


<%@ Page language="VB" ContentType="text/css" %>
<!-- #Include virtual="~/Css/reset.css" -->
<!-- #Include virtual="~/css/default.css" -->
<!-- #Include virtual="~/Css/buttons.css" -->
<!-- #Include virtual="~/Css/DropdownLists.css" -->
<!-- #Include virtual="~/css/footer.css" -->
<!-- #Include virtual="~/css/signinRibbon.css" -->
<!-- #Include virtual="~/Css/newswide.css" -->

All I have to do is include a single file in the page, and I get all the stylesheets.


<link href="/Css/MasterHomepage.aspx" rel="stylesheet" type="text/css" />

Suddenly i've reduced the 5 files to 1.

I can use the same technique for javascript


<script type="text/javascript" src="/js/googlecse.js"></script> 
<script type="text/javascript" src="/js/pngfix.js"></script> 
<script type="text/javascript" src="/js/jquery.js"></script> 
<script type="text/javascript" src="/js/DropdownList.js"></script>
<script type="text/javascript" src="/js/daterangepicker.js"></script>

becomes


<%@ Page language="VB" ContentType="text/javascript" %>
<!-- #Include virtual="~/js/googlecse.js" -->
<!-- #Include virtual="~/js/pngfix.js" -->
<!-- #Include virtual="~/js/jquery.js" -->
<!-- #Include virtual="~/js/DropdownList.js" -->
<!-- #Include virtual="~/js/daterangepicker.js" -->

That's it for now, but check back later to see how we sped up these pages further.

Wednesday, July 28, 2010

Why you shouldn't mix your View logic with your Controller logic.

I was called up to say the admin pages for the supplier gallery had stopped working, all new images wouldn't upload. After searching through the javascript (which usually causes issues here) I discovered this juicy morsel in the code behind....


If PostID = 0 And btnUpload.Text.Trim.Equals("Upload Photo") Then
  AddPostToMediaGallery()
  Response.Redirect("MediaGallery.aspx?msg=add")
Else
  UpdateMediaGalleryPost()
  Response.Redirect("MediaGallery.aspx?msg=update")
End If

That reads "If the post has no ID, and the button to upload the photo was named 'Upload Photo' then Add the photo to the media gallery, otherwise update the existing media gallery photo."

Firstly, that's too much information to read in a few lines of code, so let's refactor it a little...



Public Enum ImageActionType
  AddImage
  UpdateImage
End Enum

Public Readonly Property ActionType As ImageActionType
  Get
    If PostID = 0 And btnUpload.Text.Trim.Equals("Upload Photo") Then
      Return AddImage
    Else
      Return UpdateImage
    End If
  End Get
End Property

...


If ActionType = ImageActionType.AddImage Then
  AddPostToMediaGallery()
  Response.Redirect("MediaGallery.aspx?msg=add")
Else
  UpdateMediaGalleryPost()
  Response.Redirect("MediaGallery.aspx?msg=update")
End If

There, we've moved the nasty bit of code into a much easier to read property, but the problem remains. So next I check both the properties that are in the page. PostID is 0, that's as expected... this is a new Post, so we don't have an ID. I'm confused why "0" is representing the absence of an integer. What if there was a valid post with the ID of 0? We should make PostID nullable.


Private _postID As Nullable(Of Integer)
Public Property PostID() As Nullable(Of Integer)
  Get
    Return _postID
  End Get
  Set(ByVal value As Nullable(Of Integer))
    _postID = value
  End Set
End Property

Great, things are looking better already, but we haven't fixed the cause of the issue. Let me check the value of the buttons' text....

Upload your Photo

Here it is! Some designer has come in and changed the text of the button to "Upload your photo". That should be allowed. Why would anyone expect that changing the text of a button changes the behaviour too? I can see in the code that if the user is performing an update, it changes the label to another wording. This is the view, but the controller relies on the contents of the view to function. This is really bad. I checked all the code stubs and this is simply over-zealous programming. On no account is PostID ever set to anything but 0 for a new photo. Even if we did rely on a parameter in the page, we definitely shouldn't be storing it on the view.

So i've gone into the code and removed that condition, now it looks much nicer.


    Public ReadOnly Property ActionType() As ImageActionType
        Get
            If Not PostID.HasValue Then
                Return ImageActionType.AddImage
            Else
                Return ImageActionType.UpdateImage
            End If
        End Get
    End Property

Monday, March 08, 2010

Chrome rendering bug? Check the spec!

I've been experiencing a weird rendering issue in my site. After messing around with the HTML and using Safari's Web Inspector, I found the issue was recreate-able...

<div class="clear"/>
<p>This doen't work</p>
<div class="clear"></div>
<p>This does</p>
<br class="clear" />
<p>This does</p>

Webkit browsers like Safari (and Google Chrome) took it upon themselves to output the following HTML...

<div class="clear">
    <p>This doen't work</p>
    <div class="clear"></div>
    <p>This does</p>
    <br class="clear" />
    <p>This does</p>
</div>

Nasty. Or so I thought. After researching online, I found a great little post. There are only 10 HTML tags that can be empty. Basically webkit is rendering exactly how the XHTML spec says it should be rendered.

As a result, I booted up textPad and did a file search on the following...

<[^area|^base|^br|^col|^hr|^img|^input|link|^meta|^param]+[^>]*/>

Which reads - "Find me all tags that are not one of the legally allowed empty tags"

Quite nice, unless you use a tag based language to develop your site in (dot net, coldfusion)... it'll match all your empty asp.net tags as well.

<asp:textBox id="txtEmail" runat="server" />

We'll let's invert it and say "All empty tags that are valid XHTML tags, but not valid to be empty."

<[a|abbr|acronym|address|applet|b|bdo|big|blockquote|body|button|caption|cite|code|colgroup|dd|del|dfn|div|dl|DOCTYPE|dt|em|fieldset|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|html|i|iframe|ins|kbd|label|legend|li|map|noframes|noscript|object|ol|optgroup|option|p|pre|q|samp|script|select|small|span|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|ul|var][ ]+[^>]*/>

And went through the task of closing all the tags. I guess you should expect rendering issue if you don't conform to web standards.

Friday, August 22, 2008

Show me your ID

One of the joys of working on software that has been outsourced to third parties, is fixing all the third party issues that only get discover after the support contract has expired.

Apart from having massive portions of duplicated code all over a payment system, I had to discover why some payments were going missing, and some were being assigned to the wrong businesses. After days of refactoring the code to remove all the duplication, I found this little nugget in one of the stored procedures...

insert into tbl_invoiceitems (invoice_id, list_id,voucher_id)

values (IDENT_CURRENT('tbl_invoice'), IDENT_CURRENT('tbl_directoryListing'), @vid)

Confused with is strange query, I decided to research this unusual Ident_Current. Basically it allows you to get the last identity insert into a table. That's fine if the stored procedure you are using has done that previous insert, but these stored procedures were called one after the other hoping to god that no other inserts were done between each other. They even returned the new inserted identities, but promptly forgot about them so they could hope that the identity was waiting in the right place in a queue in sql server. It's kind of like going to an airport baggae retrivval system, and grabbing the first case of the same colour. 9 times out of 10, it'll probably be yours and you can carry on your day. Occasionally however - someone runs off with the wrong case and everything goes tits up.

I even saw an attempt at rectifying the problem, where the developer had created the exact same procedure, adding a TBL_DircetoryListing identity property, but risking the invoice number column.

I recitified it by removing the IDENT_CURRENT clauses. Now I have to test the 30 combinations of the code repetition throughout the awful code (with slight changes that i'm almost too scared to fix)

Thursday, August 14, 2008

Optimizing SQL Queries

I was working on some code when I came across a behemoth of an inline SQL query nestled in the code behind of a page.

Select a.list_id, a.company_name, a.company_description, a.url,

( case when (Select upper(County_Name) from tbl_County where County_id=a.County_id) is null then

(Select upper(Country_Name) from tbl_description_Country where Country_id= a.Country_id)

else

(Select upper(County_Name) from tbl_County where County_id= a.County_id) end )

as County_name,

b.review_approval,

( case when (Select county_id from tbl_County where County_id = a.County_id)= 0 then

(Select Country_id from tbl_description_Country where Country_id = a.Country_id)

else

(Select county_id from tbl_County where County_id= a.County_id) end )

as CountryID,

dbo.funcGetRatingSum (a.list_id) as ratingSum,

dbo.funcGetRatingCount (a.list_id) as totalCount,

(Select case when (select top 1 banner_path from tbl_banners where list_id = a.list_id and a.subscription_id = 2) is null then

else

(select top 1 banner_path from tbl_banners where list_id = a.list_id and a.subscription_id = 2) end)

as bannerpath

from tbl_directorylisting a, tbl_preferences as b ,

tbl_companymaster c where a. status = 1 and(a.company_id =

b.company_id And a.status = 1) and c.category_id= @CategoryID

and a.company_id= c.company_id and a.subscription_id in (2,3) and subscription_date >= (dateadd(year,-1,getdate()))

Order by county_name, a.subscription_id asc, a.date_fee_paid

Actually it was within several lines of concatenated text values, and had no formatting whatsoever!. My first intention was try to make it a stored procedure, but doing so I couldn't help but try to optimize it a little. Here's my attempt....

Select

listing.list_id,

listing.company_name,

listing.company_description,

listing.url,

Coalesce(county.County_name, country.Country_Name) As County_name,

prefs.review_approval,

listing.Country_ID,

dbo.funcGetRatingSum (listing.list_id) as ratingSum,

dbo.funcGetRatingCount (listing.list_id) as totalCount,

Coalesce((select top 1 banner_path from tbl_banners where list_id = listing.list_id and listing.subscription_id = 2), '') as bannerpath

FROM

tbl_directorylisting listing

Inner Join tbl_preferences prefs

On listing.company_id = prefs.company_id

Inner Join tbl_companymaster company

On listing.company_id = company.company_id

Left Outer Join tbl_county county

On listing.county_id = county.county_id

Inner Join tbl_country country

On listing.country_id = country.country_id

where listing.status = 1

And company.category_id= @CategoryID

and listing.subscription_id in (2,3)

and listing.subscription_date >= (dateadd(year,-1,getdate()))

Order by county_name, listing.subscription_id asc, listing.date_fee_paid

Firstly you'll notice the second query is far easier to read (hopefully). I dont stand for the old skool method of joining tables, join with a join and it's obvious what is going on, plus it prepares isolation of results for the where clause afterwards.

Secondly there are fewer joins in the Column definition part of the select. I have included one of the original selects, because when optimizing the stored procedure it was siginificantly fast enough to use select top 1 ... than it was to Left Outer Join. I did however Coalesce so that the case statement didn't repeat iteself.

What do you think? Is that better or worse?