Using DuckDB’s FSUM() Function for More Accurate Results

DuckDB has a fsum() function that can be used instead of the regular sum() function in order to get more accurate results. fsum() calculates the sum using a floating point summation method known as Kahan summation (or compensated summation).

This method helps reduce the accumulation of rounding errors that can occur when summing many floating point numbers when using the regular sum() function.

What is Kahan Summation?

Kahan summation, developed by William Kahan, is an algorithm that reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers. The algorithm compensates for rounding errors by keeping a separate “running compensation” term.

Syntax

The basic syntax goes like this:

fsum(arg)

Where arg is typically a column for which we want to calculate the sum of.

Example 1: Basic Example

Suppose we create and populate the following table:

CREATE TABLE scientific_readings (
    id INT,
    measurement DOUBLE
);

INSERT INTO scientific_readings VALUES
    (1, 0.000000001),
    (2, 0.000000002),
    (3, 1000000.0),
    (4, 0.000000001),
    (5, 0.000000003);

We can use the fsum() function to get the total sum of all numbers in the measurement column:

SELECT
    fsum(measurement)
FROM scientific_readings;

Result:

+-------------------+
| fsum(measurement) |
+-------------------+
| 1000000.000000007 |
+-------------------+

Example 2: Comparing `fsum()` and `sum()`

Let’s compare the results of fsum() to those of the sum() function:

SELECT
    sum(measurement) AS standard_sum,
    fsum(measurement) AS kahan_sum,
    fsum(measurement) - sum(measurement) AS difference
FROM scientific_readings;

Result:

+--------------------+-------------------+-------------------------+
|    standard_sum    |     kahan_sum     |       difference        |
+--------------------+-------------------+-------------------------+
| 1000000.0000000071 | 1000000.000000007 | -1.1641532182693481e-10 |
+--------------------+-------------------+-------------------------+

We can see that we get a slightly different result, depending on which function we use. The difference might be small but can be significant in scientific computations or financial calculations.

Example 3: Using a Larger Dataset

Let’s create another table, but this time we’ll insert thousands of rows:

-- Create a table
CREATE TABLE measurements (
    id INT,
    value DOUBLE
);

-- Insert reciprocal values (1/1, 1/2, 1/3, etc.)
INSERT INTO measurements 
SELECT 
    generate_series AS id,
    1.0/generate_series AS value
FROM generate_series(1, 10000);

-- Check the first 15 rows
SELECT * FROM measurements LIMIT 15;

Output:

+----+---------------------+
| id |        value        |
+----+---------------------+
| 1  | 1.0                 |
| 2  | 0.5                 |
| 3  | 0.3333333333333333  |
| 4  | 0.25                |
| 5  | 0.2                 |
| 6  | 0.16666666666666666 |
| 7  | 0.14285714285714285 |
| 8  | 0.125               |
| 9  | 0.1111111111111111  |
| 10 | 0.1                 |
| 11 | 0.09090909090909091 |
| 12 | 0.08333333333333333 |
| 13 | 0.07692307692307693 |
| 14 | 0.07142857142857142 |
| 15 | 0.06666666666666667 |
+----+---------------------+

The table contains 10,000 rows of data, but that’s just the top fifteen. We can see a lot of decimal places in some of those numbers.

Now let’s run a query that calculates the sum of all the values in the above table:

SELECT 
    sum(value) AS standard_sum,
    fsum(value) AS kahan_sum,
    fsum(value) - sum(value) AS difference
FROM measurements;

Result:

+-------------------+-------------------+-----------------------+
|   standard_sum    |     kahan_sum     |      difference       |
+-------------------+-------------------+-----------------------+
| 9.787606036044348 | 9.787606036044382 | 3.375077994860476e-14 |
+-------------------+-------------------+-----------------------+

Again fsum() and sum() returned slightly different results.

Example 4: Using `fsum()` with Window Functions

Here’s an example that uses a window function:

SELECT 
    id,
    value,
    sum(value) OVER (ORDER BY id) AS running_total,
    fsum(value) OVER (ORDER BY id) AS accurate_running_total
FROM measurements
WHERE id <= 15
ORDER BY id;

Result:

+----+---------------------+--------------------+------------------------+
| id |        value        |   running_total    | accurate_running_total |
+----+---------------------+--------------------+------------------------+
| 1  | 1.0                 | 1.0                | 1.0                    |
| 2  | 0.5                 | 1.5                | 1.5                    |
| 3  | 0.3333333333333333  | 1.8333333333333333 | 1.8333333333333333     |
| 4  | 0.25                | 2.083333333333333  | 2.0833333333333335     |
| 5  | 0.2                 | 2.283333333333333  | 2.283333333333333      |
| 6  | 0.16666666666666666 | 2.4499999999999997 | 2.45                   |
| 7  | 0.14285714285714285 | 2.5928571428571425 | 2.592857142857143      |
| 8  | 0.125               | 2.7178571428571425 | 2.717857142857143      |
| 9  | 0.1111111111111111  | 2.8289682539682537 | 2.828968253968254      |
| 10 | 0.1                 | 2.9289682539682538 | 2.9289682539682538     |
| 11 | 0.09090909090909091 | 3.0198773448773446 | 3.019877344877345      |
| 12 | 0.08333333333333333 | 3.103210678210678  | 3.103210678210678      |
| 13 | 0.07692307692307693 | 3.180133755133755  | 3.180133755133755      |
| 14 | 0.07142857142857142 | 3.251562326562327  | 3.2515623265623264     |
| 15 | 0.06666666666666667 | 3.3182289932289937 | 3.3182289932289932     |
+----+---------------------+--------------------+------------------------+

Here we’re comparing sum() with fsum() again. Some results are identical and some aren’t.

When to Use `fsum()`

You should consider using fsum() instead of sum() when:

Working with datasets containing many floating point numbers
Dealing with scientific computations requiring high precision
Processing financial data where small rounding errors could accumulate
Computing totals of values with widely different magnitudes

Conclusion

While sum() is suitable for many use cases, fsum() provides higher precision when dealing with floating point numbers. The difference becomes particularly important when:

Getting the total sum of many numbers
Working with values of significantly different magnitudes
Dealing with applications requiring high numerical precision

So if you need high precision when calculating totals in DuckDB, try using fsum() instead of sum().

What is Kahan Summation?

Syntax

Example 1: Basic Example

Example 2: Comparing fsum() and sum()

Example 3: Using a Larger Dataset

Example 4: Using fsum() with Window Functions

When to Use fsum()

Conclusion

Example 2: Comparing `fsum()` and `sum()`

Example 4: Using `fsum()` with Window Functions

When to Use `fsum()`