A Quick Look at the SUM() Function in DuckDB

One of the more commonly used aggregate functions in SQL databases is the sum() function, which provides an easy way to calculate the total sum of a numeric column. DuckDB is no exception.

In this article, we’ll explore how the sum() function works in DuckDB, along with some straightforward examples to demonstrate.

What is the sum() Function?

The sum() function is an aggregate function that computes the total sum of a set of numeric values. It is often used in SQL queries to summarize data, such as calculating the total sales, revenue, or any other numeric metric across a dataset.

Syntax

The basic syntax of the sum() function is as follows:

sum(arg)

Where arg is the name of the column for which you want to calculate the sum.

Sample Data

Before diving into examples, let’s create a sample table and populate it with data. We’ll use a table named sales that stores information about sales transactions:

CREATE TABLE sales (
    id INTEGER,
    product_name VARCHAR,
    quantity INTEGER,
    price DECIMAL(10, 2)
);

INSERT INTO sales (id, product_name, quantity, price)
VALUES
    (1, 'Laptop', 2, 1200.00),
    (2, 'Smartphone', 5, 800.00),
    (3, 'Tablet', 3, 500.00),
    (4, 'Headphones', 10, 100.00),
    (5, 'Monitor', 4, 300.00);

Example 1: Basic Usage of sum()

This examples calculates the total number of products sold:

SELECT 
    sum(quantity) AS total_products_sold
FROM sales;

Result:

+---------------------+
| total_products_sold |
+---------------------+
| 24 |
+---------------------+

We simply passed the quantity column to the sum() function, and the function calculated the total for us.

The AS total_products_sold part is optional. It provides a meaningful name for the output column.

Example 2: Calculate Total Revenue

Let’s calculate the total revenue. To do this, we’ll multiply the quantity by the price for each row and then sum the results:

SELECT 
    sum(quantity * price) AS total_revenue
FROM sales;

Result:

+---------------+
| total_revenue |
+---------------+
| 10100.00 |
+---------------+

This query calculates the total revenue by multiplying the quantity and price for each product and then summing up the results.

It’s basically like doing this:

SELECT (((((2 * 1200.00) + (5 * 800.00)) + (3 * 500.00)) + (10 * 100.00)) + (4 * 300.00));

Result:

+------------------------------------------------------------------------------------+
| (((((2 * 1200.00) + (5 * 800.00)) + (3 * 500.00)) + (10 * 100.00)) + (4 * 300.00)) |
+------------------------------------------------------------------------------------+
| 10100.00 |
+------------------------------------------------------------------------------------+

Example 3: Using sum() with GROUP BY

The sum() function is often used in conjunction with the GROUP BY clause to calculate sums for different groups of data. For instance, let’s calculate the total revenue generated by each product:

SELECT 
    product_name, 
    sum(quantity * price) AS product_revenue
FROM sales
GROUP BY product_name;

Result:

+--------------+-----------------+
| product_name | product_revenue |
+--------------+-----------------+
| Headphones | 1000.00 |
| Laptop | 2400.00 |
| Monitor | 1200.00 |
| Tablet | 1500.00 |
| Smartphone | 4000.00 |
+--------------+-----------------+

This query groups the data by product_name and calculates the total revenue for each product.

Example 4: Using sum() with a WHERE Clause

You can also use the sum() function with a WHERE clause to filter the data before calculating the sum. For example, let’s calculate the total revenue for products with a price greater than $500:

SELECT 
    sum(quantity * price) AS high_value_revenue
FROM sales
WHERE price > 500;

Result:

+--------------------+
| high_value_revenue |
+--------------------+
| 6400.00 |
+--------------------+

This query filters the rows where the price is greater than $500 and then calculates the total revenue for those products.

Example 5: Handling NULL Values

The sum() function automatically ignores NULL values. To demonstrate this, let’s insert a row with a NULL price into the sales table.

INSERT INTO sales (id, product_name, quantity, price)
VALUES (6, 'Keyboard', 7, NULL);

Now, let’s calculate the total revenue again.

SELECT 
    sum(quantity * price) AS total_revenue
FROM sales;

Result:

+---------------+
| total_revenue |
+---------------+
| 10100.00 |
+---------------+

The result remains the same as in our earlier example. The sum() function simply ignored the row with the NULL price.

Example 6: Using sum() with DISTINCT

The sum() function can also be used with the DISTINCT clause to calculate the sum of unique values. For example, let’s calculate the sum of distinct prices in the sales table.

SELECT 
    sum(DISTINCT price) AS sum_distinct_prices
FROM sales;

Result:

+---------------------+
| sum_distinct_prices |
+---------------------+
| 2900.00 |
+---------------------+

Passing the Wrong Argument Type

If we pass the wrong argument type we’ll get an error. For example:

SELECT 
    sum(product_name)
FROM sales;

Result:

Binder Error: No function matches the given name and argument types 'sum(VARCHAR)'. You might need to add explicit type casts.
Candidate functions:
sum(DECIMAL) -> DECIMAL
sum(SMALLINT) -> HUGEINT
sum(INTEGER) -> HUGEINT
sum(BIGINT) -> HUGEINT
sum(HUGEINT) -> HUGEINT
sum(DOUBLE) -> DOUBLE

LINE 2: sum(product_name)
^

In this case I passed the product_name column, which is defined as a VARCHAR column. The sum() function doesn’t accept this data type. The error message provides us with guidance as to which data types can be used with the sum() function (basically, numeric types).

Using fsum() for More Accurate Results

DuckDB also provides an fsum() function, which calculates the sum using a floating point summation method known as Kahan summation (or compensated summation).

This method helps reduce the accumulation of rounding errors that can occur when summing many floating point numbers when using the regular sum() function.

The fsum() function uses the same syntax as sum() (i.e. fsum(arg)), so to use fsum(), simply replace sum() with fsum(). Everything else can remain the same.

Here’s one of our earlier examples, refactored to use fsum():

SELECT 
    fsum(quantity * price) AS total_revenue
FROM sales;

Result:

+---------------+
| total_revenue |
+---------------+
| 10100.0 |
+---------------+

In this case we get the same result, but the results can differ, depending on the values being added.