Friday, September 03, 2010

How to Pivot a survey table in SQL Server and return percentages in the columns

I came across a problem earlier today that had me stuck for a whole lunchbreak.

We send out yearly surveys, and our growing number of members choose weather to complete the survey. Now that we have historic data, I wanted to compare all the data of the years in one simple database query. I immediately thought of Sql Server 2005's Pivot command.

First - I will set up a temporary table that simulates a flattened view of a couple of fake questions in the survey. The actual tables are very denormalised, but this represents just the data I was trying to analyse.


DECLARE @Survey TABLE (
  surveyname VARCHAR(50),
  username   VARCHAR(50),
  question   VARCHAR(255),
  answer     VARCHAR(50))

INSERT INTO @Survey
            (surveyname,
             username,
             question,
             answer)
SELECT '2008',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2008',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2008',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2008',
       'Susan',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2009',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2009',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2009',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2009',
       'Susan',
       'Where do you live?',
       'Wales'
UNION ALL
SELECT '2009',
       'Joe',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2009',
       'Joe',
       'Where do you live?',
       'Scotland'
UNION ALL
SELECT '2010',
       'Fred',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2010',
       'Fred',
       'Where do you live?',
       'England'
UNION ALL
SELECT '2010',
       'Susan',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2010',
       'Susan',
       'Where do you live?',
       'Wales'
UNION ALL
SELECT '2010',
       'Joe',
       'What sex are you?',
       'Male'
UNION ALL
SELECT '2010',
       'Joe',
       'Where do you live?',
       'Scotland'
UNION ALL
SELECT '2010',
       'Hillary',
       'What sex are you?',
       'Female'
UNION ALL
SELECT '2010',
       'Hillary',
       'Where do you live?',
       'Scotland' 


Selecting from that table shows all the data


SELECT *

FROM   @survey 


Now we want to compare how the total number of answers for each survey have changed over the years.


SELECT *
FROM   (SELECT username,
               surveyname,
               question,
               answer
        FROM   @survey mainsurvey) AS answers PIVOT (COUNT(username) FOR
       surveyname IN
       ([2008], [2009], [2010])) AS pivoted
ORDER  BY question 


Output is shown below.







QuestionAnswer200820092010
What sex are you?Female112
What sex are you?Male122
Where do you live?England211
Where do you live?Scotland012
Where do you live?Wales011


This form of data really hard to read. It's hard to see from the table how many people have answered each question. It would be far better to use percentages for each row.

I thought this would be easy - just put the percentage calculation in the pivot clause. I attempted to do this and just received syntax errors. After reading the spec I realised the only valid statement before the "for" tag is an aggregate function. On by adding DLLs to sql server can you add an aggregate, and I was losing confidence in the ability to display percentages. I took this opportunity to go on lunch!

While eating it came to me. You can change the value going into the pivot to a fraction of a percentage for each survey question. That way you can sum the values and create the final percentage values.

Full code sits below...


SELECT *
FROM   (SELECT surveyname,
               weight = 100.0 / (SELECT COUNT(DISTINCT username)
                                 FROM   @survey total
                                 WHERE  total.surveyname =
                                mainsurvey.surveyname),
               question,
               answer
        FROM   @survey mainsurvey) AS answers PIVOT (SUM(weight) FOR surveyname
       IN (
       [2008], [2009], [2010])) AS pivoted
ORDER  BY question 

the result has nice percentages in it!






QuestionAnswer200820092010
What sex are you?Female50.00000000000033.33333333333350.000000000000
What sex are you?Male50.00000000000066.66666666666650.000000000000
Where do you live?England100.00000000000033.33333333333325.000000000000
Where do you live?ScotlandNULL33.33333333333350.000000000000
Where do you live?WalesNULL33.33333333333325.000000000000