Wednesday, March 28, 2012

Pivot and UnPivot - SQL


Pivot helps to apply transpose on a table (making row values under specific column as column headers in the final output table and grouping the other column specific row values).
UnPivot helps to apply transpose such that the column names will be values under specific column for each row in the final output table after running the sql).
Looking into the behaviors of “Pivot” and “Un Pivot” as stated above, looks like
UnPivot(Pivot(Table))=Table?

But is not true really always, when a pivot is applied on a table, the query expects an aggregate function like (max, min, sum, etc…) on the column which has to be displayed in the final table under the specific column (which is row value before pivot). So when there is more than one record in the actual table which has same row value in the column which we are about to make as header in the pivot table will apply aggregate on the other column. This applied aggregate could be sum, cannot be brought back while applying unpivot.

Below example helps to understand better.

Pivot

I have created a new table “Marks” which holds the student name (sname), subject id (sid), marks (marks).
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Marks](
      [sid] [int] NOT NULL,
      [marks] [float] NOT NULL,
      [sname] [varchar](250) NOT NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

I have inserted few dummy records like


Sid   Marks SName
1     50    Name1
2     50    Name1
3     80    Name1
1     90    Name2
2     95    Name2
3     99    Name2

I am looking for an output where it should be like,
SName                 Maths   Social       English                TotalScore
Name1                 50          50              80                        180
Name2                 90          95              99                         284
Where Sid=1 (Maths), Sid=2(Social), Sid=3 (English).
So I can understand that the in each row in the actual table, I need to bring out the “Sid” values and
Attach as header in the output expected table also applying alias for each sid.
Now, we need to check the marks” and “SName”, in this case I can see “SName” is same for a set of records which will be grouped by in pivot. The other one is the marks”, so for a given Sid and Sname I can see there will be only on record and hence only one marks value can be fetched.
In this case it is not required to worry about to apply the aggregate function (min or max or sum), since there will be only one record.
In case for a given Sid and Sname if  there is more than one record and then it matters the right aggregate function to apply.

Below is my sql to get the expected table result.

select Sname, [1] as Maths,[2] as Social,[3] as English,
[1]+[2]+[3] as TotalScore
from
(select * from dbo.Marks) tab1
pivot(max(marks) for sid in ([1],[2],[3])) as pvt
From query, iterating  through each row in the main table, query considers only those records whose “Sid”is 1 or 2 or 3
After getting the final list of records, fetching the maximum of marks for each row (Sname and Sid).
The query groups the list of columns other than (Sid and Marks) which is SName.

UnPivot

Lets us to try to unpivot the same table which we pivoted above.
I am creating a temporary table inserting records from the above pivot query skipping the total score.

select Sname, [1] as Maths,[2] as Social,[3] as English
 into dbo.Marks_PVT
from
(select * from dbo.Marks) tab1
pivot(max(marks) for sid in ([1],[2],[3])) as pvt

The above creates a new table named as  Marks_PVT

SName                 Maths   Social       English               
Name1                 50          50              80                      
Name2                 90          95              99                      
                Now, while trying to unpivot obviously my thought will be generate a similar kind of original table which we used to pivot in above example.
Which is like
Subject           Marks    SName
Maths              50           Name1
Social              50           Name1
English             80           Name1
Maths              90           Name2
Social              95           Name2
English             99           Name2

Observing the expected output, I can understand from the Marks_PVT, ineed to make columns “Maths”, “ Social” and “English” as row values for each under some column called “Subject”.
Also moving the corresponding values for each column of “Maths”, “ Social” and “English” under a single column called ‘marks”.

To do this, we need to UnPivot saying as
“marks for Subject in
([Maths],[Social],[English]”
Below query reflects the desired output
select Subject,Marks,Sname from
(select * from dbo.Marks_PVT) tab1
unpivot(marks for Subject in
([Maths],[Social],[English])) tab2

Note : Here we can observe the behaviour is like
UnPivot(Pivot(Table))=Table
This fails when, during pivot for a given Sid and Sname if  there is more than one record and then it pivot helps to display by aggregating them as single row. In this case more than 1 row in the original table is pivoted as a single row and this can be brought back as multiple during unpivot, since aggregation has happened.



No comments: