SQL Server provides GROUP BY CUBE()
functionality that we can use in our SQL queries, which creates groups for all possible combinations of columns.
It’s similar to the GROUP BY ROLLUP()
functionality, except that it can provide us with more information, due to the fact that it groups all possible combinations.
Example
Suppose we have the following table:
SELECT * FROM Dogs;
Result:
DogName Activity Score ------- ----------- ----- Bruno Fetch Stick 43 Cooper Fetch Stick 43 Max Fetch Stick 91 Bruno Wag Tail 65 Cooper Wag Tail 51 Max Wag Tail 87 Bruno Keep Quiet 3 Cooper Keep Quiet null Max Keep Quiet 1
This is a table of dogs and their scores at performing various activities.
Here’s an example of a query that uses GROUP BY CUBE()
against that table:
SELECT
Activity,
DogName,
SUM(Score) AS "Score"
FROM Dogs
GROUP BY CUBE (Activity, DogName);
Result:
Activity DogName Score ----------- ------- ----- Fetch Stick Bruno 43 Keep Quiet Bruno 3 Wag Tail Bruno 65 null Bruno 111 Fetch Stick Cooper 43 Keep Quiet Cooper null Wag Tail Cooper 51 null Cooper 94 Fetch Stick Max 91 Keep Quiet Max 1 Wag Tail Max 87 null Max 179 null null 384 Fetch Stick null 177 Keep Quiet null 4 Wag Tail null 203
In this example I used the SUM()
function to calculate the total scores for each of the dogs, for each activity, and also a grand total for all activities.
The GROUP BY CUBE()
operation enabled us to get all combinations of totals and sub-totals.
A more simplistic query (without the CUBE
option) might look like this:
SELECT
Activity,
SUM(Score) AS "Score"
FROM Dogs
GROUP BY Activity;
Result:
Activity Score ----------- ----- Fetch Stick 177 Keep Quiet 4 Wag Tail 203
In this case I simply got the total scores for each activity.
If we wanted the total scores for each dog, we could do this:
SELECT
DogName,
SUM(Score) AS "Score"
FROM Dogs
GROUP BY DogName;
Result:
DogName Score ------- ----- Bruno 111 Cooper 94 Max 179
These individual queries work fine if that’s all we need. But the CUBE
option provided us with the output of both of these queries, as well as the underlying data, all within a single query.
Changing the NULL
Values
We can see that the CUBE
operation results in NULL
being returned in certain columns for that row.
The NULL
returned by GROUP BY CUBE
is a special use of NULL
. Microsoft’s documentation states:
The
NULL
returned as the result of aROLLUP
,CUBE
orGROUPING SETS
operation is a special use ofNULL
. This acts as a column placeholder in the result set and means all.
So in our case, whenever we see null
in either the Activity
column or the DogName
column, we know that it means “all”.
That being said, it’s also entirely possible that a column could contain NULL values that aren’t related to the CUBE
operation. In that case, it could be easy to get confused over which one was produced by the CUBE
and which one is simply a NULL value in the underlying data.
Fortunately, we have some tools at our disposal that can help us distinguish between these different NULL
s.
Here’s an example of changing the NULL values to something more meaningful:
SELECT
IIF(GROUPING(Activity) = 1, 'All Activities', Activity) AS "Activity",
IIF(GROUPING(DogName) = 1, 'All Dogs', DogName) AS "DogName",
SUM(Score) AS "Score"
FROM Dogs
GROUP BY CUBE (Activity, DogName);
Result:
Activity DogName Score -------------- -------- ----- Fetch Stick Bruno 43 Keep Quiet Bruno 3 Wag Tail Bruno 65 All Activities Bruno 111 Fetch Stick Cooper 43 Keep Quiet Cooper null Wag Tail Cooper 51 All Activities Cooper 94 Fetch Stick Max 91 Keep Quiet Max 1 Wag Tail Max 87 All Activities Max 179 All Activities All Dogs 384 Fetch Stick All Dogs 177 Keep Quiet All Dogs 4 Wag Tail All Dogs 203
Here I used the GROUPING()
function in conjunction with the IIF()
function to replace any aggregated NULL
s with a string.
The GROUPING()
function returns 1
whenever the column expression is aggregated. I therefore passed it to the IIF()
function to present a different value, depending on whether it’s aggregated or not.
In this case, the Activity
column returns the string All Activities
if it’s aggregated, and just the column value if it’s not aggregated. The DogName
column returns All Dogs
if it’s aggregated, and just the column value if not.
Outdated Syntax
It’s also possible to use a syntax that goes something like GROUP BY ... WITH CUBE
(where ...
is the column list), but this syntax is not recommended, and it’s only provided for backward compatibility.
Here’s an example:
SELECT
Activity,
DogName,
SUM(Score) AS "Score"
FROM Dogs
GROUP BY Activity, DogName WITH CUBE;
Result:
Activity DogName Score ----------- ------- ----- Fetch Stick Bruno 43 Keep Quiet Bruno 3 Wag Tail Bruno 65 null Bruno 111 Fetch Stick Cooper 43 Keep Quiet Cooper null Wag Tail Cooper 51 null Cooper 94 Fetch Stick Max 91 Keep Quiet Max 1 Wag Tail Max 87 null Max 179 null null 384 Fetch Stick null 177 Keep Quiet null 4 Wag Tail null 203
It produces the same results, but Microsoft recommends that we no longer use this syntax.