Understanding the CHECKSUM_AGG() Function in SQL Server

SQL Server has a CHECKSUM_AGG() function that allows us to get the checksum of the values in a group. This can be useful for detecting changes in the values in a column.

Syntax

The syntax goes like this:

CHECKSUM_AGG ( [ ALL | DISTINCT ] expression )

So expression is a required argument, but we also have the option of specifying ALL or DISTINCT.

expression must be an int.

By default, ALL is used, which means that the function is applied to all values, including duplicates.

If we specify DISTINCT, then the function only returns the checksum of unique values.

We can also use an OVER clause with the CHECKSUM_AGG() function.

Example

Suppose we have the following table:

SELECT * FROM Dogs;

Result:

DogName  Activity     Score
-------  -----------  -----
Bruno    Fetch Stick  43   
Cooper   Fetch Stick  43   
Max      Fetch Stick  91   
Bruno    Wag Tail     65   
Cooper   Wag Tail     51   
Max      Wag Tail     87   
Bruno    Keep Quiet   2    
Cooper   Keep Quiet   null 
Max      Keep Quiet   1    

We can use the CHECKSUM_AGG() function against that table like this:

SELECT CHECKSUM_AGG( Score )
FROM Dogs;

Result:

125

NULL values are ignored when using CHECKSUM_AGG(). Our table contains a NULL value, and so it was ignored.

Updates to Data

If we update the data, we should see a change in the checksum returned by CHECKSUM_AGG().

Let’s update one of the values in the Score column:

UPDATE Dogs SET Score = 3 
WHERE DogName = 'Bruno' 
AND Activity = 'Keep Quiet';

Now let’s run CHECKSUM_AGG() again:

SELECT CHECKSUM_AGG( Score )
FROM Dogs;

Result:

124

The output value has changed.

The DISTINCT Keyword

In SQL, the DISTINCT clause can be used in various places to apply operations to just distinct values. Redundant duplicates are removed, and only unique values are used.

The CHECKSUM_AGG() function accepts a DISTINCT keyword in its syntax. This allows us to get the checksum based on distinct values:

SELECT CHECKSUM_AGG( DISTINCT Score )
FROM Dogs;

Result:

87

So we can see that the output is different to the output in the previous example, even though we haven’t changed any data. All we did was use the DISTINCT keyword so that the CHECKSUM_AGG() function performs its calculation using only distinct values.

The reason we got a different result is because we do actually have duplicate values in our column. The first two rows have 43 in the Score column, and so they’re duplicates. Using the DISTINCT option means that these two values are treated as one.

The ALL Keyword

We can use the ALL keyword to explicitly state that all values are included – including duplicates:

SELECT CHECKSUM_AGG( ALL Score )
FROM Dogs;

Result:

124

Our result reflects that duplicates were included.

ALL is the default setting when using the CHECKSUM_AGG() function, so we get the same result that we got when we omitted the ALL keyword.

Non-Int Expressions

The expression that CHECKSUM_AGG() operates on must be an int expression. If it’s not an int, then an error occurs.

To demonstrate this, here’s an example of applying CHECKSUM_AGG() to a non-int value:

SELECT 
  CHECKSUM_AGG( Score )
FROM Cats;

Result:

Msg 8117, Level 16, State 1, Server 84489bd0709b, Line 14
Operand data type smallint is invalid for checksum_agg operator.

In this case the column that I passed is of type smallint. So even other integer types such as smallint, tinyint, and bigint are not accepted. It must be int.

We can get around this by converting the expression to an int:

SELECT
    CHECKSUM_AGG( CAST(Score AS int) )
FROM Cats;

Result:

113

This time the code ran without error.