If you know about the avg()
function in SQLite, you’re probably aware that it returns the average of all non-NULL X within a group.
But did you know you can add the DISTINCT
keyword to this function?
If you add the DISTINCT
keyword, avg()
will calculate its results based on distinct values only. This is essentially the same as removing duplicate values and then calculating the average on the remaining values.
Syntax
To use the DISTINCT
keyword, you simply insert it as the first argument.
Like this:
avg(DISTINCT X)
Where X
is the column name for which you’re calculating the average.
Example
Take the following table called Products
:
ProductId ProductName Price ---------- ------------- ---------- 1 Widget Holder 139.5 2 Blue Widget 10.0 3 Red Widget 10.0 4 Green Widget 10.0 5 Widget Stick 89.75 6 Foo Cap 11.99
If I run a normal avg()
on the Price column:
SELECT avg(Price) FROM Products;
Here’s what I get:
45.2066666666667
But if I run a DISTINCT
query:
SELECT avg(DISTINCT Price) FROM Products;
I get this:
62.81
So in this case it changes the result quite considerably.
Just to be clear, here they are side by side:
SELECT
avg(Price) AS "Non-Distinct",
avg(DISTINCT Price) AS "Distinct"
FROM Products;
Result:
Non-Distinct Distinct ---------------- ---------- 45.2066666666667 62.81
As you can probably imagine, using the DISTINCT
keyword with avg()
could skew the results enormously, especially if there are a lot of duplicates at one end of the range, but not many duplicates at the other end.