One of the more commonly used aggregate functions in SQL databases is the sum()
function, which provides an easy way to calculate the total sum of a numeric column. DuckDB is no exception.
In this article, we’ll explore how the sum()
function works in DuckDB, along with some straightforward examples to demonstrate.
What is the sum()
Function?
The sum()
function is an aggregate function that computes the total sum of a set of numeric values. It is often used in SQL queries to summarize data, such as calculating the total sales, revenue, or any other numeric metric across a dataset.
Syntax
The basic syntax of the sum()
function is as follows:
sum(arg)
Where arg
is the name of the column for which you want to calculate the sum.
Sample Data
Before diving into examples, let’s create a sample table and populate it with data. We’ll use a table named sales
that stores information about sales transactions:
CREATE TABLE sales (
id INTEGER,
product_name VARCHAR,
quantity INTEGER,
price DECIMAL(10, 2)
);
INSERT INTO sales (id, product_name, quantity, price)
VALUES
(1, 'Laptop', 2, 1200.00),
(2, 'Smartphone', 5, 800.00),
(3, 'Tablet', 3, 500.00),
(4, 'Headphones', 10, 100.00),
(5, 'Monitor', 4, 300.00);
Example 1: Basic Usage of sum()
This examples calculates the total number of products sold:
SELECT
sum(quantity) AS total_products_sold
FROM sales;
Result:
+---------------------+
| total_products_sold |
+---------------------+
| 24 |
+---------------------+
We simply passed the quantity
column to the sum()
function, and the function calculated the total for us.
The AS total_products_sold
part is optional. It provides a meaningful name for the output column.
Example 2: Calculate Total Revenue
Let’s calculate the total revenue. To do this, we’ll multiply the quantity
by the price
for each row and then sum the results:
SELECT
sum(quantity * price) AS total_revenue
FROM sales;
Result:
+---------------+
| total_revenue |
+---------------+
| 10100.00 |
+---------------+
This query calculates the total revenue by multiplying the quantity
and price
for each product and then summing up the results.
It’s basically like doing this:
SELECT (((((2 * 1200.00) + (5 * 800.00)) + (3 * 500.00)) + (10 * 100.00)) + (4 * 300.00));
Result:
+------------------------------------------------------------------------------------+
| (((((2 * 1200.00) + (5 * 800.00)) + (3 * 500.00)) + (10 * 100.00)) + (4 * 300.00)) |
+------------------------------------------------------------------------------------+
| 10100.00 |
+------------------------------------------------------------------------------------+
Example 3: Using sum()
with GROUP BY
The sum()
function is often used in conjunction with the GROUP BY
clause to calculate sums for different groups of data. For instance, let’s calculate the total revenue generated by each product:
SELECT
product_name,
sum(quantity * price) AS product_revenue
FROM sales
GROUP BY product_name;
Result:
+--------------+-----------------+
| product_name | product_revenue |
+--------------+-----------------+
| Headphones | 1000.00 |
| Laptop | 2400.00 |
| Monitor | 1200.00 |
| Tablet | 1500.00 |
| Smartphone | 4000.00 |
+--------------+-----------------+
This query groups the data by product_name
and calculates the total revenue for each product.
Example 4: Using sum()
with a WHERE
Clause
You can also use the sum()
function with a WHERE
clause to filter the data before calculating the sum. For example, let’s calculate the total revenue for products with a price greater than $500:
SELECT
sum(quantity * price) AS high_value_revenue
FROM sales
WHERE price > 500;
Result:
+--------------------+
| high_value_revenue |
+--------------------+
| 6400.00 |
+--------------------+
This query filters the rows where the price
is greater than $500 and then calculates the total revenue for those products.
Example 5: Handling NULL
Values
The sum()
function automatically ignores NULL
values. To demonstrate this, let’s insert a row with a NULL
price into the sales
table.
INSERT INTO sales (id, product_name, quantity, price)
VALUES (6, 'Keyboard', 7, NULL);
Now, let’s calculate the total revenue again.
SELECT
sum(quantity * price) AS total_revenue
FROM sales;
Result:
+---------------+
| total_revenue |
+---------------+
| 10100.00 |
+---------------+
The result remains the same as in our earlier example. The sum()
function simply ignored the row with the NULL
price.
Example 6: Using sum()
with DISTINCT
The sum()
function can also be used with the DISTINCT
clause to calculate the sum of unique values. For example, let’s calculate the sum of distinct prices in the sales
table.
SELECT
sum(DISTINCT price) AS sum_distinct_prices
FROM sales;
Result:
+---------------------+
| sum_distinct_prices |
+---------------------+
| 2900.00 |
+---------------------+
Passing the Wrong Argument Type
If we pass the wrong argument type we’ll get an error. For example:
SELECT
sum(product_name)
FROM sales;
Result:
Binder Error: No function matches the given name and argument types 'sum(VARCHAR)'. You might need to add explicit type casts.
Candidate functions:
sum(DECIMAL) -> DECIMAL
sum(SMALLINT) -> HUGEINT
sum(INTEGER) -> HUGEINT
sum(BIGINT) -> HUGEINT
sum(HUGEINT) -> HUGEINT
sum(DOUBLE) -> DOUBLE
LINE 2: sum(product_name)
^
In this case I passed the product_name
column, which is defined as a VARCHAR
column. The sum()
function doesn’t accept this data type. The error message provides us with guidance as to which data types can be used with the sum()
function (basically, numeric types).
Using fsum()
for More Accurate Results
DuckDB also provides an fsum()
function, which calculates the sum using a floating point summation method known as Kahan summation (or compensated summation).
This method helps reduce the accumulation of rounding errors that can occur when summing many floating point numbers when using the regular
function.sum
()
The fsum()
function uses the same syntax as sum()
(i.e. fsum(arg)
), so to use fsum()
, simply replace sum()
with fsum()
. Everything else can remain the same.
Here’s one of our earlier examples, refactored to use fsum()
:
SELECT
fsum(quantity * price) AS total_revenue
FROM sales;
Result:
+---------------+
| total_revenue |
+---------------+
| 10100.0 |
+---------------+
In this case we get the same result, but the results can differ, depending on the values being added.