In DuckDB, avg()
is an aggregate function that returns the average of all non-NULL values in its argument (usually a column).
In this article we’ll take a quick look at this function, along with some basic examples.
Syntax
The syntax for avg()
goes like this:
avg(arg)
Where arg
is the column name for which we want to get the average.
Sample Data
Let’s create some sample data for the examples in this article.
-- Create an employees table
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2)
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Beavis Blakefield', 'Sales', 75000.00),
(2, 'Mahavishnu Mars', 'Sales', 82000.00),
(3, 'Veralda Vett', 'Engineering', NULL),
(4, 'Hank Heckler', 'Engineering', 70000.00),
(5, 'Slash Slater', 'Engineering', 98000.00),
(6, 'Mutt Manger', 'Engineering', 87000.00),
(7, 'Bon Butler', 'Marketing', 70000.00),
(8, 'Dave Danker', 'Marketing', 72000.00),
(9, 'Winnie Wringle', 'Marketing', 91000.00);
-- Select all data from the table
SELECT * FROM employees;
Output:
+--------+-------------------+-------------+----------+
| emp_id | emp_name | department | salary |
+--------+-------------------+-------------+----------+
| 1 | Beavis Blakefield | Sales | 75000.00 |
| 2 | Mahavishnu Mars | Sales | 82000.00 |
| 3 | Veralda Vett | Engineering | null |
| 4 | Hank Heckler | Engineering | 70000.00 |
| 5 | Slash Slater | Engineering | 98000.00 |
| 6 | Mutt Manger | Engineering | 87000.00 |
| 7 | Bon Butler | Marketing | 70000.00 |
| 8 | Dave Danker | Marketing | 72000.00 |
| 9 | Winnie Wringle | Marketing | 91000.00 |
+--------+-------------------+-------------+----------+
The following examples will query the above table.
Example – Basic Usage
Let’s return the average salary for the whole table:
SELECT avg(salary) FROM employees;
Result:
+-------------+
| avg(salary) |
+-------------+
| 80625.0 |
+-------------+
Example With GROUP BY
Clause
We can use the GROUP BY
clause to get the average salary for each department:
SELECT
department,
avg(salary)
FROM employees
GROUP BY department;
Result:
+-------------+-------------------+
| department | avg(salary) |
+-------------+-------------------+
| Engineering | 85000.0 |
| Marketing | 77666.66666666667 |
| Sales | 78500.0 |
+-------------+-------------------+
Example of Window Function Usage
We can use the avg()
function in a windowing context.
Example:
SELECT
department,
salary,
avg(salary) OVER (PARTITION BY department) AS department_average
FROM employees;
Result:
+-------------+----------+--------------------+
| department | salary | department_average |
+-------------+----------+--------------------+
| Sales | 75000.00 | 78500.0 |
| Sales | 82000.00 | 78500.0 |
| Engineering | null | 85000.0 |
| Engineering | 70000.00 | 85000.0 |
| Engineering | 98000.00 | 85000.0 |
| Engineering | 87000.00 | 85000.0 |
| Marketing | 70000.00 | 77666.66666666667 |
| Marketing | 72000.00 | 77666.66666666667 |
| Marketing | 91000.00 | 77666.66666666667 |
+-------------+----------+--------------------+
NULL Handling
The avg()
function deals with NULL values as follows:
- NULL values are completely ignored in calculations
- If all values in a group are NULL, the result is NULL
- Mixed NULL and non-NULL values only consider the non-NULL values
To demonstrate this, let’s set all salaries for the Marketing department to NULL, then re-run the query from the previous example:
-- Set all Marketing salaries to NULL
UPDATE employees
SET salary = NULL
WHERE department = 'Marketing';
-- Get the average salaries for each department
SELECT
department,
salary,
avg(salary) OVER (PARTITION BY department) AS department_average
FROM employees;
Result:
+-------------+----------+--------------------+
| department | salary | department_average |
+-------------+----------+--------------------+
| Sales | 75000.00 | 78500.0 |
| Sales | 82000.00 | 78500.0 |
| Engineering | null | 85000.0 |
| Engineering | 70000.00 | 85000.0 |
| Engineering | 98000.00 | 85000.0 |
| Engineering | 87000.00 | 85000.0 |
| Marketing | null | null |
| Marketing | null | null |
| Marketing | null | null |
+-------------+----------+--------------------+
We can see that all Marketing salaries are NULL and so the average salary is NULL.
We can also see that the Engineering department already had a NULL value in one of its rows. That NULL value was ignored when calculating the average salary for the department.
Type Handling and Precision
When calculating averages, integer inputs are promoted to DOUBLE
:
-- Create table
CREATE TABLE t1 (
c1 INTEGER,
);
-- Insert data
INSERT INTO t1 (c1) VALUES
( 1 ),
( 2 ),
( 3 );
-- Calculate the average and also get the type returned by the avg() function
SELECT
avg(c1),
typeof(avg(c1))
FROM t1;
Result:
+---------+-----------------+
| avg(c1) | typeof(avg(c1)) |
+---------+-----------------+
| 2.0 | DOUBLE |
+---------+-----------------+
We can see that although we specified INTEGER
for the c1
column, when we ran avg()
against it, the function returned a DOUBLE
. This is true even when the value could easily be represented as an integer (as it could in this example).
Alias: The mean()
Function
In DuckDB, the avg()
function has an alias called mean()
. Therefore, we can use either function to get the same result.
Example:
SELECT
department,
salary,
avg(salary) OVER (PARTITION BY department) AS department_average,
mean(salary) OVER (PARTITION BY department) AS department_mean
FROM employees;
Result:
+-------------+----------+--------------------+-----------------+
| department | salary | department_average | department_mean |
+-------------+----------+--------------------+-----------------+
| Sales | 75000.00 | 78500.0 | 78500.0 |
| Sales | 82000.00 | 78500.0 | 78500.0 |
| Engineering | null | 85000.0 | 85000.0 |
| Engineering | 70000.00 | 85000.0 | 85000.0 |
| Engineering | 98000.00 | 85000.0 | 85000.0 |
| Engineering | 87000.00 | 85000.0 | 85000.0 |
| Marketing | null | null | null |
| Marketing | null | null | null |
| Marketing | null | null | null |
+-------------+----------+--------------------+-----------------+
Increased Accuracy: The favg()
Function
DuckDB also has a favg()
function that does the same thing as avg()
, except with greater accuracy, especially many floating point numbers are involved. To use this function, simply use favg()
instead of avg()
.