As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of some of the T-SQL I’m less familiar with. One example of this is the GROUPING SETS clause. There are also ROLLUP and CUBE clauses but I believe these are deprecated and GROUPING SETS can do everything ROLLUP and CUBE can do but with much more control.
The best way to explain how GROUPING SETS works is to show some example code. I’ll show how it compares to ROLLUP and CUBE in case you need to update any code to convert it to using GROUPING SETS.
So lets create some test data. I’m just creating a simple table containing a few purchases made by a few customers…
DROP TABLE IF EXISTS dbo.GroupingSetsDemo GO CREATE TABLE dbo.GroupingSetsDemo ( Id INT NOT NULL IDENTITY(1,1), CustomerID INT NOT NULL, PurchaseDate DATETIME2(0) NOT NULL, PurchaseItem VARCHAR(100) NOT NULL, Amount DECIMAL(10,2) NOT NULL, ); GO INSERT INTO dbo.GroupingSetsDemo ( CustomerID, PurchaseDate, PurchaseItem, Amount ) VALUES (1, '20171106 12:34', 'Rugby Ball', 14.99), (1, '20180307 09:54', 'Frying Pan', 21.50), (1, '20180307 10:02', 'Vacuum Cleaner', 259.990), (2, '20171106 15:26', 'Orchid', 13.62), (2, '20180228 18:47', 'Book', 7.99), (2, '20180301 08:02', 'Snakes and Ladders', 8.00), (2, '20180306 14:32', 'Pillows', 20.00), (2, '20180306 14:32', 'Washing Machine', 359.99), (2, '20180307 23:15', 'Jeans', 59.99), (3, '20180301 09:27', 'Rucksack', 13.99), (3, '20180301 09:27', 'Scarf', 12.50); GO SELECT * FROM dbo.GroupingSetsDemo; GO
If we want to see the total value of the purchases for each customer we can just group by the CustomerID and sum up the Amount values as follows…
SELECT CustomerID, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY CustomerID
As well a grouping by columns we can group by what are called GROUPING SETS. This example just shows how we can use GROUPING SETS to mimic the previous query to get total sales by customer…
SELECT CustomerID, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ((CustomerID));
So in this case we’ve just said to group by a single grouping set for CustomerID.
The really useful stuff starts to happen when we have more than one grouping set. One useful example is where you want to create a total value in the same data set. To do this we just need to add an additional grouping set telling SQL Server to also group over all columns to give us the total amount for all customers…
SELECT CustomerID, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ((CustomerID), ());
Please note, that you still need to make sure all columns in the SELECT clause appear in at least one grouping set. So we can’t just include the total group set from the above query. If we try, we get the following error as expected…
SELECT CustomerID, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS (());
Msg 8120, Level 16, State 1, Line 56 Column 'dbo.GroupingSetsDemo.CustomerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
More interesting things start to happen when we want to group over multiple columns. Let’s bring the purchase year into our query as follows..
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ((CustomerID, YEAR(WFD.PurchaseDate)));
We can now add more grouping sets to this query, for example, to give total amount for each customer…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (CustomerID, YEAR(WFD.PurchaseDate)), (CustomerID) );
… or for each year…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (CustomerID, YEAR(WFD.PurchaseDate)), (YEAR(WFD.PurchaseDate)) );
We can also combine everything to give us all possible total values
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (CustomerID, YEAR(WFD.PurchaseDate)), (CustomerID), (YEAR(WFD.PurchaseDate)), () );
As we can see in the previous example, whenever we are adding a grouping set and getting a total row appear, one or more of the values returned can be NULL. However, what if one of the values being grouped on is NULL. How do we differential between the original NULL and the NULL created as part of the grouping set.
In the following example one of the PurchaseItem values is NULL….
DROP TABLE IF EXISTS dbo.GroupingSetsDemoNULL CREATE TABLE dbo.GroupingSetsDemoNULL ( Id INT NOT NULL IDENTITY(1,1), CustomerID INT NOT NULL, PurchaseItem VARCHAR(100) NULL, Amount DECIMAL(10,2) NOT NULL, ); GO INSERT INTO dbo.GroupingSetsDemoNULL ( CustomerID, PurchaseItem, Amount ) VALUES (1, 'Rugby Ball', 10), (1, 'Frying Pan', 20), (1, NULL, 30); GO SELECT * FROM dbo.GroupingSetsDemoNULL;
Let’s say we want to give a total per customer and per item, and a grand total. We need to create three grouping sets as follows…
SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemoNULL WFD GROUP BY GROUPING SETS ( (CustomerID), (PurchaseItem), () );
As we can see, we’ve got two rows where the CustomerID and PurchaseItems are NULL. How do we know which one of these is the aggregation of the NULL PurchaseItem acorss all customer and which one is the grand total? In this example looking at the amount we can see the that row 4 with the value of 60 is the grand total but we can also use the GROUPING_ID to show this explicitly.
The GROUPING_ID function is a scalar function that returns a BIT. We pass in a particular column name and this returns a flag saying if each row is part of the aggregation for that column or not.
In this example we’ve got two columns we are aggregating over so let’s use the GROUPING_ID function on both of these as follows…
SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount, GROUPING_ID(CustomerID) AS CustomerIDGrouped, GROUPING_ID(PurchaseItem) AS PurchaseItemGrouped FROM dbo.GroupingSetsDemoNULL WFD GROUP BY GROUPING SETS ( (CustomerID), (PurchaseItem), () );
As we can see row 1 is grouped on the CustomerID only and is the aggregation of the NULL PurchaseItem for all customers and row 4 is the grand total aggregated across all customers and items.
As mentioned above, the only requirement is that each non-aggregated column appears in at least one grouping set so we can return just the total for each customer and for each year as follows…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (CustomerID), (YEAR(WFD.PurchaseDate)) );
So what about ROLLUP and CUBE (but please remember these are deprecated)?
CUBE just means that the query is grouped by every possible grouping set, i.e. every combination of non-aggregated columns are used as the grouping sets. The syntax for using CUBE is as follows…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID WITH CUBE ORDER BY WFD.CustomerID, PurchaseYear
This is the same as using the following grouping sets…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (YEAR(WFD.PurchaseDate), WFD.CustomerID), (WFD.CustomerID), (YEAR(WFD.PurchaseDate)), () ) ORDER BY WFD.CustomerID, PurchaseYear
ROLLUP is slightly different and it depends on the order of the columns in the GROUP BY clause. It works by creating a grouping set across all columns in the GROUP BY clause, then creating additional sets across all columns except the last and each time dropping the last column off the end until it’s just left with the empty group. So if we’re grouping over three columns it will create a set for (1,2,3) then for (1,2) then for (1) and finally for (). Here’s an example…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY WFD.CustomerID, YEAR(WFD.PurchaseDate) WITH ROLLUP
So this is grouping over both customer and year, then by just customer and finally over all columns. This is the same as creating the following grouping sets…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY GROUPING SETS ( (CustomerID, YEAR(WFD.PurchaseDate)), (CustomerID), () )
As mentioned, the order of the columns in the GROUP BY clause makes a difference and switching them from WFD.CustomerID, YEAR(WFD.PurchaseDate) to YEAR(WFD.PurchaseDate), WFD.CustomerID gives different results…
SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount FROM dbo.GroupingSetsDemo WFD GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID WITH ROLLUP
This time we’re grouping by year and customer, then by year, and finally by all columns.