3 Ways to Get the Weighted Average in DuckDB

Weighted averages are common calculations in data analysis, allowing us to assign different levels of importance to individual values in our dataset. Unlike simple averages, where each value has equal impact, weighted averages let us incorporate the relative significance of each observation. This is particularly valuable for scenarios like calculating GPA (where courses have different credit weights), investment portfolio returns (where assets have varying allocations), or quality ratings (where reviewers have different expertise levels).

In this article, we’ll explore three ways of calculating weighted averages in DuckDB.

What is a Weighted Average?

Before diving into the different approaches, let’s quickly review what a weighted average is.

A weighted average is calculated by multiplying each value by its corresponding weight, summing these products, and dividing by the sum of all weights:

Weighted Average = (w₁×v₁ + w₂×v₂ + … + wₙ×vₙ) / (w₁ + w₂ + … + wₙ)

Where:

  • v₁, v₂, …, vₙ are the values
  • w₁, w₂, …, wₙ are the corresponding weights

Sample Data

We’ll use the following table/data for the examples in this article:

CREATE TABLE product_ratings (
    product_id INTEGER,
    rating DECIMAL(3,1),
    reviewer_weight INTEGER
);

INSERT INTO product_ratings VALUES
    (1, 4.5, 10),
    (1, 3.0, 5),
    (1, 5.0, 2),
    (2, 4.0, 7),
    (2, 2.5, 8);

This table is called product_ratings with the following columns:

  • product_id: Unique identifier for each product
  • rating: Customer rating (1-5 stars)
  • reviewer_weight: Weight reflecting reviewer expertise or importance (higher is more important)

Option 1: Using SUM() and Division

The approach uses DuckDB’s SUM() function to implement the weighted average formula directly:

SELECT SUM(value * weight) / SUM(weight) AS weighted_avg
FROM your_table;

This approach is versatile and works in most SQL databases (most SQL databases have a SUM() function), not just DuckDB. Let’s see it in action with a concrete example.

Now let’s calculate the weighted average rating per product:

SELECT 
    product_id,
    SUM(rating * reviewer_weight) / SUM(reviewer_weight) AS weighted_avg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+---------------------+
| product_id | weighted_avg_rating |
+------------+---------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+---------------------+

In this example, the rating of 4.5 with weight 10 has more influence than the rating of 5.0 with weight 2, properly reflecting the relative importance of each reviewer.

Option 2: Using the WEIGHTED_AVG() Function

For more convenience, DuckDB provides a dedicated WEIGHTED_AVG() aggregate function:

SELECT WEIGHTED_AVG(value, weight) AS weighted_avg
FROM your_table;

This built-in function handles all the calculation details internally, making our queries cleaner and more readable. Here’s how to calculate the weighted average using the WEIGHTED_AVG() function:

SELECT 
    product_id,
    WEIGHTED_AVG(rating, reviewer_weight) AS weighted_avg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+---------------------+
| product_id | weighted_avg_rating |
+------------+---------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+---------------------+

Both approaches produce the same result, but the WEIGHTED_AVG() function offers a more expressive and concise syntax.

Option 3: Using the WAVG() Function

DuckDB also provides the WAVG() function, which is an alias for WEIGHTED_AVG():

SELECT WAVG(value, weight) AS weighted_avg
FROM your_table;

Therefore, using the previous example we can replace WEIGHTED_AVG() with WAVG() to get the same result:

SELECT 
    product_id,
    WAVG(rating, reviewer_weight) AS weighted_avg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+---------------------+
| product_id | weighted_avg_rating |
+------------+---------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+---------------------+

Same result as WEIGHTED_AVG(), but with a slightly more concise syntax.

Summary

DuckDB offers several approaches for calculating weighted averages. Three of these are:

  1. The expression-based approach using SUM(value * weight) / SUM(weight)
  2. The built-in WEIGHTED_AVG(value, weight) function
  3. The built-in WAVG(value, weight) function, which is an alias of the WEIGHTED_AVG(value, weight) function

All three methods are effective and provide the same results, but the built-in functions offer cleaner syntax.