In PostgreSQL, the AVG()
function computes the average (arithmetic mean) of all the non-null input values, and returns the result.
Example
Here’s a quick example to demonstrate how it works:
SELECT AVG(amount)
FROM payment;
Result:
4.2006673312979002
In this case, amount
is a column in the payment
table.
To give this a bit more context, here’s a snapshot of the table (from the pagila
sample database):
+------------+-------------+----------+-----------+--------+----------------------------+ | payment_id | customer_id | staff_id | rental_id | amount | payment_date | +------------+-------------+----------+-----------+--------+----------------------------+ | 16050 | 269 | 2 | 7 | 1.99 | 2017-01-24 21:40:19.996577 | | 16051 | 269 | 1 | 98 | 0.99 | 2017-01-25 15:16:50.996577 | | 16052 | 269 | 2 | 678 | 6.99 | 2017-01-28 21:44:14.996577 | | 16053 | 269 | 2 | 703 | 0.99 | 2017-01-29 00:58:02.996577 | | 16054 | 269 | 1 | 750 | 4.99 | 2017-01-29 08:10:06.996577 | | 16055 | 269 | 2 | 1099 | 2.99 | 2017-01-31 12:23:14.996577 | | 16056 | 270 | 1 | 193 | 1.99 | 2017-01-26 05:10:14.996577 | | 16057 | 270 | 1 | 1040 | 4.99 | 2017-01-31 04:03:42.996577 | | 16058 | 271 | 1 | 1096 | 8.99 | 2017-01-31 11:59:15.996577 | ...
We can see the amount
column for which we calculated the average.
The table contains a lot more rows, but this shows you an example of the values.
Filtered Results
The AVG()
function operates on the rows returned by the query. So if you filter the results, the result of AVG()
will reflect that.
Lets filter the results:
SELECT AVG(amount)
FROM payment
WHERE customer_id = 271;
Result:
3.8233333333333333
So this time we got the average of all amounts paid by customer 271.
The DISTINCT
Keyword
You can use the DISTINCT
keyword with AVG()
to calculate only distinct values. That is, if there are any duplicate values, they are treated as one value.
Example:
SELECT
AVG(amount) AS "All",
AVG(DISTINCT amount) AS "Distinct"
FROM payment;
Result:
+--------------------+--------------------+ | All | Distinct | +--------------------+--------------------+ | 4.2006673312979002 | 6.1447368421052632 | +--------------------+--------------------+
In this case, there’s a very large difference between the distinct and non-distinct results. This suggests that there are a lot of duplicate values in that column.
To verify this, we can output the distinct values like this:
SELECT DISTINCT amount
FROM payment;
Result:
+--------+ | amount | +--------+ | 1.99 | | 3.98 | | 7.99 | | 5.99 | | 10.99 | | 2.99 | | 8.97 | | 8.99 | | 3.99 | | 9.98 | | 11.99 | | 7.98 | | 6.99 | | 0.00 | | 4.99 | | 5.98 | | 0.99 | | 1.98 | | 9.99 | +--------+ (19 rows)
So our example returned the average of those distinct values.