How GROUP BY CUBE() Works in SQL Server

SQL Server provides GROUP BY CUBE() functionality that we can use in our SQL queries, which creates groups for all possible combinations of columns.

It’s similar to the GROUP BY ROLLUP() functionality, except that it can provide us with more information, due to the fact that it groups all possible combinations.

Example

Suppose we have the following table:

SELECT * FROM Dogs;

Result:

DogName  Activity     Score
-------  -----------  -----
Bruno    Fetch Stick  43   
Cooper   Fetch Stick  43   
Max      Fetch Stick  91   
Bruno    Wag Tail     65   
Cooper   Wag Tail     51   
Max      Wag Tail     87   
Bruno    Keep Quiet   3    
Cooper   Keep Quiet   null 
Max      Keep Quiet   1    

This is a table of dogs and their scores at performing various activities.

Here’s an example of a query that uses GROUP BY CUBE() against that table:

SELECT 
    Activity, 
    DogName,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY CUBE (Activity, DogName);

Result:

Activity     DogName  Score
-----------  -------  -----
Fetch Stick  Bruno    43   
Keep Quiet   Bruno    3    
Wag Tail     Bruno    65   
null         Bruno    111  
Fetch Stick  Cooper   43   
Keep Quiet   Cooper   null 
Wag Tail     Cooper   51   
null         Cooper   94   
Fetch Stick  Max      91   
Keep Quiet   Max      1    
Wag Tail     Max      87   
null         Max      179  
null         null     384  
Fetch Stick  null     177  
Keep Quiet   null     4    
Wag Tail     null     203  

In this example I used the SUM() function to calculate the total scores for each of the dogs, for each activity, and also a grand total for all activities.

The GROUP BY CUBE() operation enabled us to get all combinations of totals and sub-totals.

A more simplistic query (without the CUBE option) might look like this:

SELECT 
    Activity,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY Activity;

Result:

Activity     Score
-----------  -----
Fetch Stick  177  
Keep Quiet   4    
Wag Tail     203 

In this case I simply got the total scores for each activity.

If we wanted the total scores for each dog, we could do this:

SELECT 
    DogName,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY DogName;

Result:

DogName  Score
-------  -----
Bruno    111  
Cooper   94   
Max      179  

These individual queries work fine if that’s all we need. But the CUBE option provided us with the output of both of these queries, as well as the underlying data, all within a single query.

Changing the NULL Values

We can see that the CUBE operation results in NULL being returned in certain columns for that row.

The NULL returned by GROUP BY CUBE is a special use of NULL. Microsoft’s documentation states:

The NULL returned as the result of a ROLLUP, CUBE or GROUPING SETS operation is a special use of NULL. This acts as a column placeholder in the result set and means all.

So in our case, whenever we see null in either the Activity column or the DogName column, we know that it means “all”.

That being said, it’s also entirely possible that a column could contain NULL values that aren’t related to the CUBE operation. In that case, it could be easy to get confused over which one was produced by the CUBE and which one is simply a NULL value in the underlying data.

Fortunately, we have some tools at our disposal that can help us distinguish between these different NULLs.

Here’s an example of changing the NULL values to something more meaningful:

SELECT 
    IIF(GROUPING(Activity) = 1, 'All Activities', Activity) AS "Activity", 
    IIF(GROUPING(DogName) = 1, 'All Dogs', DogName) AS "DogName",
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY CUBE (Activity, DogName);

Result:

Activity        DogName   Score
--------------  --------  -----
Fetch Stick     Bruno     43   
Keep Quiet      Bruno     3    
Wag Tail        Bruno     65   
All Activities  Bruno     111  
Fetch Stick     Cooper    43   
Keep Quiet      Cooper    null 
Wag Tail        Cooper    51   
All Activities  Cooper    94   
Fetch Stick     Max       91   
Keep Quiet      Max       1    
Wag Tail        Max       87   
All Activities  Max       179  
All Activities  All Dogs  384  
Fetch Stick     All Dogs  177  
Keep Quiet      All Dogs  4    
Wag Tail        All Dogs  203  

Here I used the GROUPING() function in conjunction with the IIF() function to replace any aggregated NULLs with a string.

The GROUPING() function returns 1 whenever the column expression is aggregated. I therefore passed it to the IIF() function to present a different value, depending on whether it’s aggregated or not.

In this case, the Activity column returns the string All Activities if it’s aggregated, and just the column value if it’s not aggregated. The DogName column returns All Dogs if it’s aggregated, and just the column value if not.

Outdated Syntax

It’s also possible to use a syntax that goes something like GROUP BY ... WITH CUBE (where ... is the column list), but this syntax is not recommended, and it’s only provided for backward compatibility.

Here’s an example:

SELECT 
    Activity, 
    DogName,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY Activity, DogName WITH CUBE;

Result:

Activity     DogName  Score
-----------  -------  -----
Fetch Stick  Bruno    43   
Keep Quiet   Bruno    3    
Wag Tail     Bruno    65   
null         Bruno    111  
Fetch Stick  Cooper   43   
Keep Quiet   Cooper   null 
Wag Tail     Cooper   51   
null         Cooper   94   
Fetch Stick  Max      91   
Keep Quiet   Max      1    
Wag Tail     Max      87   
null         Max      179  
null         null     384  
Fetch Stick  null     177  
Keep Quiet   null     4    
Wag Tail     null     203  

It produces the same results, but Microsoft recommends that we no longer use this syntax.