A Quick Look at DuckDB’s WEIGHTED_AVG() Function

In analytical SQL workloads, expressing weighted averages can sometimes involve verbose expressions such as combining the sum() function with other operators. DuckDB streamlines this with its native weighted_avg() aggregate function, allowing us to compute weighted averages directly and efficiently. The weighted_avg() function enhances both clarity and speed when dealing with data where values contribute unequally — such as population-adjusted metrics or revenue-weighted scores.

This article explores the weighted_avg() in DuckDB, along with examples to demonstrate its usage.

Syntax

The syntax goes like this:

weighted_avg(value, weight)

Where value is the value for which we want to get the weighted average, and weight is the weight to apply.

Typically, both arguments are columns, one containing the value and the other containing the weight to apply.

There’s also a wavg() function, which is an alias for weighted_avg(). This means we can use the following syntax to get the same result:

wavg(value, weight)

Example

Here’s a quick example to demonstrate:

SELECT WEIGHTED_AVG(7, 5);

Output:

7.0

This example isn’t really meaningful, because there’s only one value and weighting. To demonstrate the function properly, we need more than one. Here’s an example that uses multiple values:

SELECT WEIGHTED_AVG(val, weight) 
FROM VALUES 
    (10, 10),
    (20, 2.5)
AS t(val, weight);

Result:

12.0

In this case, the first row got a higher rating, and so the weighted average reflected that. Let’s see what happens when we change the weighting of the second row:

SELECT WEIGHTED_AVG(val, weight) 
FROM VALUES 
    (10, 10),
    (20, 5)
AS t(val, weight);

Result:

13.333333333333334

I changed the 2.5 weighting to 5 and the weighted average increased.

A Database Example

Let’s create and populate a database table:

CREATE TABLE product_ratings (
    product_id INTEGER,
    rating DECIMAL(3,1),
    reviewer_weight INTEGER
);

INSERT INTO product_ratings VALUES
    (1, 4.5, 10),
    (1, 3.0, 5),
    (1, 5.0, 2),
    (2, 4.0, 7),
    (2, 2.5, 8);

This table is called product_ratings with the following columns:

  • product_id: Unique identifier for each product
  • rating: Customer rating (1-5 stars)
  • reviewer_weight: Weight reflecting reviewer expertise or importance (higher is more important)

The following query gets the weighted average rating for each product, grouped by product:

SELECT 
product_id,
WEIGHTED_AVG(rating, reviewer_weight) AS weighted_avg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+---------------------+
| product_id | weighted_avg_rating |
+------------+---------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+---------------------+

So got the weighted average of the rating column, based on the weighting provided by the reviewer_weight column.

NULL Values

Any NULL values are skipped. To test this, let’s insert two new rows for product 2. One has a NULL rating and the other has a NULL weighting:

INSERT INTO product_ratings VALUES 
    (2, NULL, 5), 
    (2, 4.0, NULL);

SELECT * FROM product_ratings;

Output:

+------------+--------+-----------------+
| product_id | rating | reviewer_weight |
+------------+--------+-----------------+
| 1 | 4.5 | 10 |
| 1 | 3.0 | 5 |
| 1 | 5.0 | 2 |
| 2 | 4.0 | 7 |
| 2 | 2.5 | 8 |
| 2 | NULL | 5 |
| 2 | 4.0 | NULL |
+------------+--------+-----------------+

Now if we run the query again:

SELECT 
product_id,
WEIGHTED_AVG(rating, reviewer_weight) AS weighted_avg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+---------------------+
| product_id | weighted_avg_rating |
+------------+---------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+---------------------+

We get the same result as before. Neither of the NULL values changed the result.

The WAVG() Function

DuckDB also provides a wavg() function, which is an alias for weighted_avg(). Therefore, we can use either one to get the same result:

SELECT 
    product_id,
    WAVG(rating, reviewer_weight) AS wavg_rating
FROM product_ratings
GROUP BY product_id
ORDER BY product_id;

Result:

+------------+-------------------+
| product_id | wavg_rating |
+------------+-------------------+
| 1 | 4.117647058823529 |
| 2 | 3.2 |
+------------+-------------------+