Understanding GROUP BY ROLLUP() in SQL Server

When we use the GROUP BY clause in SQL Server, we have several options for specifying how the GROUP BY operation is applied. One such option is the ROLLUP modifier. We can use this modifier to create subtotals and grand totals.

Example

Suppose we have the following table:

SELECT * FROM Dogs;

Result:

DogName  Activity     Score
-------  -----------  -----
Bruno    Fetch Stick  43   
Cooper   Fetch Stick  43   
Max      Fetch Stick  91   
Bruno    Wag Tail     65   
Cooper   Wag Tail     51   
Max      Wag Tail     87   
Bruno    Keep Quiet   3    
Cooper   Keep Quiet   null 
Max      Keep Quiet   1    

This is a table of dogs and their scores at performing various activities.

Here’s an example of using ROLLUP against that data:

SELECT 
    Activity, 
    SUM(Score) AS "Score"  
FROM Dogs  
GROUP BY ROLLUP (Activity);

Result:

Activity     Score
-----------  -----
Fetch Stick  177  
Keep Quiet   4    
Wag Tail     203  
null         384  

In this example I used the SUM() function to calculate a sum of the scores for each activity. I also used the GROUP BY ROLLUP to provide a grand total of all activities. The Score column is the total score from each activity, but the ROLLUP modifier adds a fourth row, which provides the grand total of those scores.

When we use GROUP BY ROLLUP we enclose the GROUP BY columns in parentheses, as seen in the above example.

We can add more columns by separating them with a comma:

SELECT 
    Activity, 
    DogName,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY ROLLUP (Activity, DogName);

Result:

Activity     DogName  Score
-----------  -------  -----
Fetch Stick  Bruno    43   
Fetch Stick  Cooper   43   
Fetch Stick  Max      91   
Fetch Stick  null     177  
Keep Quiet   Bruno    3    
Keep Quiet   Cooper   null 
Keep Quiet   Max      1    
Keep Quiet   null     4    
Wag Tail     Bruno    65   
Wag Tail     Cooper   51   
Wag Tail     Max      87   
Wag Tail     null     203  
null         null     384  

This time we’ve got subtotals for each activity, as well as a grand total for all activities.

Changing the NULL Values

We can see that the ROLLUP operation results in NULL being returned in certain columns for that row.

The NULL returned by ROLLUP is a special use of NULL. Microsoft’s documentation states:

The NULL returned as the result of a ROLLUP, CUBE or GROUPING SETS operation is a special use of NULL. This acts as a column placeholder in the result set and means all.

So in our case, whenever we see null in either the Activity column or the DogName column, we know that it means “all”.

That being said, it’s also entirely possible that a column could contain NULL values that aren’t related to the ROLLUP operation. In that case, it could be easy to get confused over which one was produced by the ROLLUP and which one is simply a NULL value in the underlying data.

Fortunately, we have some tools at our disposal that can help us distinguish between these different NULLs.

Here’s an example of changing the NULL values to something more meaningful:

SELECT 
    IIF(GROUPING(Activity) = 1, 'All Activities', Activity) AS "Activity", 
    IIF(GROUPING(DogName) = 1, 'All Dogs', DogName) AS "DogName",
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY ROLLUP (Activity, DogName);

Result:

Activity        DogName   Score
--------------  --------  -----
Fetch Stick     Bruno     43   
Fetch Stick     Cooper    43   
Fetch Stick     Max       91   
Fetch Stick     All Dogs  177  
Keep Quiet      Bruno     3    
Keep Quiet      Cooper    null 
Keep Quiet      Max       1    
Keep Quiet      All Dogs  4    
Wag Tail        Bruno     65   
Wag Tail        Cooper    51   
Wag Tail        Max       87   
Wag Tail        All Dogs  203  
All Activities  All Dogs  384  

Here I used the GROUPING() function and the IIF() function to replace any aggregated NULLs with a string.

The GROUPING() function returns 1 whenever the column expression is aggregated. I therefore passed it to the IIF() function to present a different value, depending on whether it’s aggregated or not.

In this case, the Activity column returns the string All Activities if it’s aggregated, and just the column value if it’s not aggregated. The DogName column returns All Dogs if it’s aggregated, and just the column value if not.

Old Syntax

You may have seen ROLLUP code that uses a syntax that goes something like GROUP BY ... WITH ROLLUP.

You may have seen that syntax in earlier versions of SQL Server, or you may have seen it in other RDBMSs. As far as SQL Server goes, this syntax is only provided for backward compatibility.

Here’s an example:

SELECT 
    Activity, 
    DogName,
    SUM(Score) AS "Score"
FROM Dogs  
GROUP BY Activity, DogName WITH ROLLUP;

Result:

Activity     DogName  Score
-----------  -------  -----
Fetch Stick  Bruno    43   
Fetch Stick  Cooper   43   
Fetch Stick  Max      91   
Fetch Stick  null     177  
Keep Quiet   Bruno    3    
Keep Quiet   Cooper   null 
Keep Quiet   Max      1    
Keep Quiet   null     4    
Wag Tail     Bruno    65   
Wag Tail     Cooper   51   
Wag Tail     Max      87   
Wag Tail     null     203  
null         null     384  

It produces the same results, but Microsoft recommends that we no longer use this syntax.