We send out yearly surveys, and our growing number of members choose weather to complete the survey. Now that we have historic data, I wanted to compare all the data of the years in one simple database query. I immediately thought of Sql Server 2005's Pivot command.
First - I will set up a temporary table that simulates a flattened view of a couple of fake questions in the survey. The actual tables are very denormalised, but this represents just the data I was trying to analyse.
DECLARE @Survey TABLE (
surveyname VARCHAR(50),
username VARCHAR(50),
question VARCHAR(255),
answer VARCHAR(50))
INSERT INTO @Survey
(surveyname,
username,
question,
answer)
SELECT '2008',
'Fred',
'What sex are you?',
'Male'
UNION ALL
SELECT '2008',
'Fred',
'Where do you live?',
'England'
UNION ALL
SELECT '2008',
'Susan',
'What sex are you?',
'Female'
UNION ALL
SELECT '2008',
'Susan',
'Where do you live?',
'England'
UNION ALL
SELECT '2009',
'Fred',
'What sex are you?',
'Male'
UNION ALL
SELECT '2009',
'Fred',
'Where do you live?',
'England'
UNION ALL
SELECT '2009',
'Susan',
'What sex are you?',
'Female'
UNION ALL
SELECT '2009',
'Susan',
'Where do you live?',
'Wales'
UNION ALL
SELECT '2009',
'Joe',
'What sex are you?',
'Male'
UNION ALL
SELECT '2009',
'Joe',
'Where do you live?',
'Scotland'
UNION ALL
SELECT '2010',
'Fred',
'What sex are you?',
'Male'
UNION ALL
SELECT '2010',
'Fred',
'Where do you live?',
'England'
UNION ALL
SELECT '2010',
'Susan',
'What sex are you?',
'Female'
UNION ALL
SELECT '2010',
'Susan',
'Where do you live?',
'Wales'
UNION ALL
SELECT '2010',
'Joe',
'What sex are you?',
'Male'
UNION ALL
SELECT '2010',
'Joe',
'Where do you live?',
'Scotland'
UNION ALL
SELECT '2010',
'Hillary',
'What sex are you?',
'Female'
UNION ALL
SELECT '2010',
'Hillary',
'Where do you live?',
'Scotland'
Selecting from that table shows all the data
SELECT *
FROM @survey
Now we want to compare how the total number of answers for each survey have changed over the years.
SELECT *
FROM (SELECT username,
surveyname,
question,
answer
FROM @survey mainsurvey) AS answers PIVOT (COUNT(username) FOR
surveyname IN
([2008], [2009], [2010])) AS pivoted
ORDER BY question
Output is shown below.
Question | Answer | 2008 | 2009 | 2010 |
What sex are you? | Female | 1 | 1 | 2 |
What sex are you? | Male | 1 | 2 | 2 |
Where do you live? | England | 2 | 1 | 1 |
Where do you live? | Scotland | 0 | 1 | 2 |
Where do you live? | Wales | 0 | 1 | 1 |
This form of data really hard to read. It's hard to see from the table how many people have answered each question. It would be far better to use percentages for each row.
I thought this would be easy - just put the percentage calculation in the pivot clause. I attempted to do this and just received syntax errors. After reading the spec I realised the only valid statement before the "for" tag is an aggregate function. On by adding DLLs to sql server can you add an aggregate, and I was losing confidence in the ability to display percentages. I took this opportunity to go on lunch!
While eating it came to me. You can change the value going into the pivot to a fraction of a percentage for each survey question. That way you can sum the values and create the final percentage values.
Full code sits below...
SELECT *
FROM (SELECT surveyname,
weight = 100.0 / (SELECT COUNT(DISTINCT username)
FROM @survey total
WHERE total.surveyname =
mainsurvey.surveyname),
question,
answer
FROM @survey mainsurvey) AS answers PIVOT (SUM(weight) FOR surveyname
IN (
[2008], [2009], [2010])) AS pivoted
ORDER BY question
the result has nice percentages in it!
Question | Answer | 2008 | 2009 | 2010 |
What sex are you? | Female | 50.000000000000 | 33.333333333333 | 50.000000000000 |
What sex are you? | Male | 50.000000000000 | 66.666666666666 | 50.000000000000 |
Where do you live? | England | 100.000000000000 | 33.333333333333 | 25.000000000000 |
Where do you live? | Scotland | NULL | 33.333333333333 | 50.000000000000 |
Where do you live? | Wales | NULL | 33.333333333333 | 25.000000000000 |