SQL Server has a CHECKSUM_AGG()
function that allows us to get the checksum of the values in a group. This can be useful for detecting changes in the values in a column.
Syntax
The syntax goes like this:
CHECKSUM_AGG ( [ ALL | DISTINCT ] expression )
So expression
is a required argument, but we also have the option of specifying ALL
or DISTINCT
.
expression
must be an int.
By default, ALL
is used, which means that the function is applied to all values, including duplicates.
If we specify DISTINCT
, then the function only returns the checksum of unique values.
We can also use an OVER
clause with the CHECKSUM_AGG()
function.
Example
Suppose we have the following table:
SELECT * FROM Dogs;
Result:
DogName Activity Score ------- ----------- ----- Bruno Fetch Stick 43 Cooper Fetch Stick 43 Max Fetch Stick 91 Bruno Wag Tail 65 Cooper Wag Tail 51 Max Wag Tail 87 Bruno Keep Quiet 2 Cooper Keep Quiet null Max Keep Quiet 1
We can use the CHECKSUM_AGG()
function against that table like this:
SELECT CHECKSUM_AGG( Score )
FROM Dogs;
Result:
125
NULL
values are ignored when using CHECKSUM_AGG()
. Our table contains a NULL
value, and so it was ignored.
Updates to Data
If we update the data, we should see a change in the checksum returned by CHECKSUM_AGG()
.
Let’s update one of the values in the Score
column:
UPDATE Dogs SET Score = 3
WHERE DogName = 'Bruno'
AND Activity = 'Keep Quiet';
Now let’s run CHECKSUM_AGG()
again:
SELECT CHECKSUM_AGG( Score )
FROM Dogs;
Result:
124
The output value has changed.
The DISTINCT
Keyword
In SQL, the DISTINCT
clause can be used in various places to apply operations to just distinct values. Redundant duplicates are removed, and only unique values are used.
The CHECKSUM_AGG()
function accepts a DISTINCT
keyword in its syntax. This allows us to get the checksum based on distinct values:
SELECT CHECKSUM_AGG( DISTINCT Score )
FROM Dogs;
Result:
87
So we can see that the output is different to the output in the previous example, even though we haven’t changed any data. All we did was use the DISTINCT
keyword so that the CHECKSUM_AGG()
function performs its calculation using only distinct values.
The reason we got a different result is because we do actually have duplicate values in our column. The first two rows have 43
in the Score
column, and so they’re duplicates. Using the DISTINCT
option means that these two values are treated as one.
The ALL
Keyword
We can use the ALL
keyword to explicitly state that all values are included – including duplicates:
SELECT CHECKSUM_AGG( ALL Score )
FROM Dogs;
Result:
124
Our result reflects that duplicates were included.
ALL
is the default setting when using the CHECKSUM_AGG()
function, so we get the same result that we got when we omitted the ALL
keyword.
Non-Int Expressions
The expression that CHECKSUM_AGG()
operates on must be an int expression. If it’s not an int, then an error occurs.
To demonstrate this, here’s an example of applying CHECKSUM_AGG()
to a non-int value:
SELECT
CHECKSUM_AGG( Score )
FROM Cats;
Result:
Msg 8117, Level 16, State 1, Server 84489bd0709b, Line 14 Operand data type smallint is invalid for checksum_agg operator.
In this case the column that I passed is of type smallint. So even other integer types such as smallint, tinyint, and bigint are not accepted. It must be int.
We can get around this by converting the expression to an int:
SELECT
CHECKSUM_AGG( CAST(Score AS int) )
FROM Cats;
Result:
113
This time the code ran without error.