DuckDB has a fsum()
function that can be used instead of the regular sum()
function in order to get more accurate results. fsum()
calculates the sum using a floating point summation method known as Kahan summation (or compensated summation).
This method helps reduce the accumulation of rounding errors that can occur when summing many floating point numbers when using the regular
function.sum
()
What is Kahan Summation?
Kahan summation, developed by William Kahan, is an algorithm that reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers. The algorithm compensates for rounding errors by keeping a separate “running compensation” term.
Syntax
The basic syntax goes like this:
fsum(arg)
Where arg
is typically a column for which we want to calculate the sum of.
Example 1: Basic Example
Suppose we create and populate the following table:
CREATE TABLE scientific_readings (
id INT,
measurement DOUBLE
);
INSERT INTO scientific_readings VALUES
(1, 0.000000001),
(2, 0.000000002),
(3, 1000000.0),
(4, 0.000000001),
(5, 0.000000003);
We can use the fsum()
function to get the total sum of all numbers in the measurement
column:
SELECT
fsum(measurement)
FROM scientific_readings;
Result:
+-------------------+
| fsum(measurement) |
+-------------------+
| 1000000.000000007 |
+-------------------+
Example 2: Comparing fsum()
and sum()
Let’s compare the results of fsum()
to those of the sum()
function:
SELECT
sum(measurement) AS standard_sum,
fsum(measurement) AS kahan_sum,
fsum(measurement) - sum(measurement) AS difference
FROM scientific_readings;
Result:
+--------------------+-------------------+-------------------------+
| standard_sum | kahan_sum | difference |
+--------------------+-------------------+-------------------------+
| 1000000.0000000071 | 1000000.000000007 | -1.1641532182693481e-10 |
+--------------------+-------------------+-------------------------+
We can see that we get a slightly different result, depending on which function we use. The difference might be small but can be significant in scientific computations or financial calculations.
Example 3: Using a Larger Dataset
Let’s create another table, but this time we’ll insert thousands of rows:
-- Create a table
CREATE TABLE measurements (
id INT,
value DOUBLE
);
-- Insert reciprocal values (1/1, 1/2, 1/3, etc.)
INSERT INTO measurements
SELECT
generate_series AS id,
1.0/generate_series AS value
FROM generate_series(1, 10000);
-- Check the first 15 rows
SELECT * FROM measurements LIMIT 15;
Output:
+----+---------------------+
| id | value |
+----+---------------------+
| 1 | 1.0 |
| 2 | 0.5 |
| 3 | 0.3333333333333333 |
| 4 | 0.25 |
| 5 | 0.2 |
| 6 | 0.16666666666666666 |
| 7 | 0.14285714285714285 |
| 8 | 0.125 |
| 9 | 0.1111111111111111 |
| 10 | 0.1 |
| 11 | 0.09090909090909091 |
| 12 | 0.08333333333333333 |
| 13 | 0.07692307692307693 |
| 14 | 0.07142857142857142 |
| 15 | 0.06666666666666667 |
+----+---------------------+
The table contains 10,000 rows of data, but that’s just the top fifteen. We can see a lot of decimal places in some of those numbers.
Now let’s run a query that calculates the sum of all the values in the above table:
SELECT
sum(value) AS standard_sum,
fsum(value) AS kahan_sum,
fsum(value) - sum(value) AS difference
FROM measurements;
Result:
+-------------------+-------------------+-----------------------+
| standard_sum | kahan_sum | difference |
+-------------------+-------------------+-----------------------+
| 9.787606036044348 | 9.787606036044382 | 3.375077994860476e-14 |
+-------------------+-------------------+-----------------------+
Again fsum()
and sum()
returned slightly different results.
Example 4: Using fsum()
with Window Functions
Here’s an example that uses a window function:
SELECT
id,
value,
sum(value) OVER (ORDER BY id) AS running_total,
fsum(value) OVER (ORDER BY id) AS accurate_running_total
FROM measurements
WHERE id <= 15
ORDER BY id;
Result:
+----+---------------------+--------------------+------------------------+
| id | value | running_total | accurate_running_total |
+----+---------------------+--------------------+------------------------+
| 1 | 1.0 | 1.0 | 1.0 |
| 2 | 0.5 | 1.5 | 1.5 |
| 3 | 0.3333333333333333 | 1.8333333333333333 | 1.8333333333333333 |
| 4 | 0.25 | 2.083333333333333 | 2.0833333333333335 |
| 5 | 0.2 | 2.283333333333333 | 2.283333333333333 |
| 6 | 0.16666666666666666 | 2.4499999999999997 | 2.45 |
| 7 | 0.14285714285714285 | 2.5928571428571425 | 2.592857142857143 |
| 8 | 0.125 | 2.7178571428571425 | 2.717857142857143 |
| 9 | 0.1111111111111111 | 2.8289682539682537 | 2.828968253968254 |
| 10 | 0.1 | 2.9289682539682538 | 2.9289682539682538 |
| 11 | 0.09090909090909091 | 3.0198773448773446 | 3.019877344877345 |
| 12 | 0.08333333333333333 | 3.103210678210678 | 3.103210678210678 |
| 13 | 0.07692307692307693 | 3.180133755133755 | 3.180133755133755 |
| 14 | 0.07142857142857142 | 3.251562326562327 | 3.2515623265623264 |
| 15 | 0.06666666666666667 | 3.3182289932289937 | 3.3182289932289932 |
+----+---------------------+--------------------+------------------------+
Here we’re comparing sum()
with fsum()
again. Some results are identical and some aren’t.
When to Use fsum()
You should consider using fsum()
instead of sum()
when:
- Working with datasets containing many floating point numbers
- Dealing with scientific computations requiring high precision
- Processing financial data where small rounding errors could accumulate
- Computing totals of values with widely different magnitudes
Conclusion
While sum()
is suitable for many use cases, fsum()
provides higher precision when dealing with floating point numbers. The difference becomes particularly important when:
- Getting the total sum of many numbers
- Working with values of significantly different magnitudes
- Dealing with applications requiring high numerical precision
So if you need high precision when calculating totals in DuckDB, try using fsum()
instead of sum()
.