SQL Server has a CHECKSUM_AGG() function that allows us to get the checksum of the values in a group. This can be useful for detecting changes in the values in a column.
Syntax
The syntax goes like this:
CHECKSUM_AGG ( [ ALL | DISTINCT ] expression )
So expression is a required argument, but we also have the option of specifying ALL or DISTINCT.
expression must be an int.
By default, ALL is used, which means that the function is applied to all values, including duplicates.
If we specify DISTINCT, then the function only returns the checksum of unique values.
We can also use an OVER clause with the CHECKSUM_AGG() function.
Example
Suppose we have the following table:
SELECT * FROM Dogs;
Result:
DogName Activity Score ------- ----------- ----- Bruno Fetch Stick 43 Cooper Fetch Stick 43 Max Fetch Stick 91 Bruno Wag Tail 65 Cooper Wag Tail 51 Max Wag Tail 87 Bruno Keep Quiet 2 Cooper Keep Quiet null Max Keep Quiet 1
We can use the CHECKSUM_AGG() function against that table like this:
SELECT CHECKSUM_AGG( Score )
FROM Dogs;
Result:
125
NULL values are ignored when using CHECKSUM_AGG(). Our table contains a NULL value, and so it was ignored.
Updates to Data
If we update the data, we should see a change in the checksum returned by CHECKSUM_AGG().
Let’s update one of the values in the Score column:
UPDATE Dogs SET Score = 3
WHERE DogName = 'Bruno'
AND Activity = 'Keep Quiet';
Now let’s run CHECKSUM_AGG() again:
SELECT CHECKSUM_AGG( Score )
FROM Dogs;
Result:
124
The output value has changed.
The DISTINCT Keyword
In SQL, the DISTINCT clause can be used in various places to apply operations to just distinct values. Redundant duplicates are removed, and only unique values are used.
The CHECKSUM_AGG() function accepts a DISTINCT keyword in its syntax. This allows us to get the checksum based on distinct values:
SELECT CHECKSUM_AGG( DISTINCT Score )
FROM Dogs;
Result:
87
So we can see that the output is different to the output in the previous example, even though we haven’t changed any data. All we did was use the DISTINCT keyword so that the CHECKSUM_AGG() function performs its calculation using only distinct values.
The reason we got a different result is because we do actually have duplicate values in our column. The first two rows have 43 in the Score column, and so they’re duplicates. Using the DISTINCT option means that these two values are treated as one.
The ALL Keyword
We can use the ALL keyword to explicitly state that all values are included – including duplicates:
SELECT CHECKSUM_AGG( ALL Score )
FROM Dogs;
Result:
124
Our result reflects that duplicates were included.
ALL is the default setting when using the CHECKSUM_AGG() function, so we get the same result that we got when we omitted the ALL keyword.
Non-Int Expressions
The expression that CHECKSUM_AGG() operates on must be an int expression. If it’s not an int, then an error occurs.
To demonstrate this, here’s an example of applying CHECKSUM_AGG() to a non-int value:
SELECT
CHECKSUM_AGG( Score )
FROM Cats;
Result:
Msg 8117, Level 16, State 1, Server 84489bd0709b, Line 14 Operand data type smallint is invalid for checksum_agg operator.
In this case the column that I passed is of type smallint. So even other integer types such as smallint, tinyint, and bigint are not accepted. It must be int.
We can get around this by converting the expression to an int:
SELECT
CHECKSUM_AGG( CAST(Score AS int) )
FROM Cats;
Result:
113
This time the code ran without error.