A Quick Look at the AVG() Function in DuckDB

In DuckDB, avg() is an aggregate function that returns the average of all non-NULL values in its argument (usually a column).

In this article we’ll take a quick look at this function, along with some basic examples.

Syntax

The syntax for avg() goes like this:

avg(arg)

Where arg is the column name for which we want to get the average.

Sample Data

Let’s create some sample data for the examples in this article.

-- Create an employees table
CREATE TABLE employees (
    emp_id INT,
    emp_name VARCHAR,
    department VARCHAR,
    salary DECIMAL(10,2)
);

-- Insert sample data
INSERT INTO employees VALUES
    (1, 'Beavis Blakefield', 'Sales', 75000.00),
    (2, 'Mahavishnu Mars', 'Sales', 82000.00),
    (3, 'Veralda Vett', 'Engineering', NULL),
    (4, 'Hank Heckler', 'Engineering', 70000.00),
    (5, 'Slash Slater', 'Engineering', 98000.00),
    (6, 'Mutt Manger', 'Engineering', 87000.00),
    (7, 'Bon Butler', 'Marketing', 70000.00),
    (8, 'Dave Danker', 'Marketing', 72000.00),
    (9, 'Winnie Wringle', 'Marketing', 91000.00);

-- Select all data from the table
SELECT * FROM employees;

Output:

+--------+-------------------+-------------+----------+
| emp_id | emp_name | department | salary |
+--------+-------------------+-------------+----------+
| 1 | Beavis Blakefield | Sales | 75000.00 |
| 2 | Mahavishnu Mars | Sales | 82000.00 |
| 3 | Veralda Vett | Engineering | null |
| 4 | Hank Heckler | Engineering | 70000.00 |
| 5 | Slash Slater | Engineering | 98000.00 |
| 6 | Mutt Manger | Engineering | 87000.00 |
| 7 | Bon Butler | Marketing | 70000.00 |
| 8 | Dave Danker | Marketing | 72000.00 |
| 9 | Winnie Wringle | Marketing | 91000.00 |
+--------+-------------------+-------------+----------+

The following examples will query the above table.

Example – Basic Usage

Let’s return the average salary for the whole table:

SELECT avg(salary) FROM employees;

Result:

+-------------+
| avg(salary) |
+-------------+
| 80625.0 |
+-------------+

Example With GROUP BY Clause

We can use the GROUP BY clause to get the average salary for each department:

SELECT 
    department, 
    avg(salary) 
FROM employees 
GROUP BY department;

Result:

+-------------+-------------------+
| department | avg(salary) |
+-------------+-------------------+
| Engineering | 85000.0 |
| Marketing | 77666.66666666667 |
| Sales | 78500.0 |
+-------------+-------------------+

Example of Window Function Usage

We can use the avg() function in a windowing context.

Example:

SELECT 
    department,
    salary, 
    avg(salary) OVER (PARTITION BY department) AS department_average
FROM employees;

Result:

+-------------+----------+--------------------+
| department | salary | department_average |
+-------------+----------+--------------------+
| Sales | 75000.00 | 78500.0 |
| Sales | 82000.00 | 78500.0 |
| Engineering | null | 85000.0 |
| Engineering | 70000.00 | 85000.0 |
| Engineering | 98000.00 | 85000.0 |
| Engineering | 87000.00 | 85000.0 |
| Marketing | 70000.00 | 77666.66666666667 |
| Marketing | 72000.00 | 77666.66666666667 |
| Marketing | 91000.00 | 77666.66666666667 |
+-------------+----------+--------------------+

NULL Handling

The avg() function deals with NULL values as follows:

  • NULL values are completely ignored in calculations
  • If all values in a group are NULL, the result is NULL
  • Mixed NULL and non-NULL values only consider the non-NULL values

To demonstrate this, let’s set all salaries for the Marketing department to NULL, then re-run the query from the previous example:

-- Set all Marketing salaries to NULL
UPDATE employees
SET salary = NULL 
WHERE department = 'Marketing';

-- Get the average salaries for each department
SELECT 
    department,
    salary, 
    avg(salary) OVER (PARTITION BY department) AS department_average
FROM employees;

Result:

+-------------+----------+--------------------+
| department | salary | department_average |
+-------------+----------+--------------------+
| Sales | 75000.00 | 78500.0 |
| Sales | 82000.00 | 78500.0 |
| Engineering | null | 85000.0 |
| Engineering | 70000.00 | 85000.0 |
| Engineering | 98000.00 | 85000.0 |
| Engineering | 87000.00 | 85000.0 |
| Marketing | null | null |
| Marketing | null | null |
| Marketing | null | null |
+-------------+----------+--------------------+

We can see that all Marketing salaries are NULL and so the average salary is NULL.

We can also see that the Engineering department already had a NULL value in one of its rows. That NULL value was ignored when calculating the average salary for the department.

Type Handling and Precision

When calculating averages, integer inputs are promoted to DOUBLE:

-- Create table
CREATE TABLE t1 (
    c1 INTEGER,
);

-- Insert data
INSERT INTO t1 (c1) VALUES 
    ( 1 ),
    ( 2 ),
    ( 3 );

-- Calculate the average and also get the type returned by the avg() function
SELECT 
    avg(c1),
    typeof(avg(c1))
FROM t1;

Result:

+---------+-----------------+
| avg(c1) | typeof(avg(c1)) |
+---------+-----------------+
| 2.0 | DOUBLE |
+---------+-----------------+

We can see that although we specified INTEGER for the c1 column, when we ran avg() against it, the function returned a DOUBLE. This is true even when the value could easily be represented as an integer (as it could in this example).

Alias: The mean() Function

In DuckDB, the avg() function has an alias called mean(). Therefore, we can use either function to get the same result.

Example:

SELECT 
    department,
    salary, 
    avg(salary) OVER (PARTITION BY department) AS department_average,
    mean(salary) OVER (PARTITION BY department) AS department_mean
FROM employees;

Result:

+-------------+----------+--------------------+-----------------+
| department | salary | department_average | department_mean |
+-------------+----------+--------------------+-----------------+
| Sales | 75000.00 | 78500.0 | 78500.0 |
| Sales | 82000.00 | 78500.0 | 78500.0 |
| Engineering | null | 85000.0 | 85000.0 |
| Engineering | 70000.00 | 85000.0 | 85000.0 |
| Engineering | 98000.00 | 85000.0 | 85000.0 |
| Engineering | 87000.00 | 85000.0 | 85000.0 |
| Marketing | null | null | null |
| Marketing | null | null | null |
| Marketing | null | null | null |
+-------------+----------+--------------------+-----------------+

Increased Accuracy: The favg() Function

DuckDB also has a favg() function that does the same thing as avg(), except with greater accuracy, especially many floating point numbers are involved. To use this function, simply use favg() instead of avg().