SQL provides us with an easy way to automatically compute the moving average of a given column.
The moving average (also known as the rolling average, running average, moving mean (MM), or rolling mean) is a series of averages of different selections of the full data set. The moving average changes as we move through the data set. We can add the moving average to our SQL query result sets to see how it changes across the result set.
Example
Here’s an example that calculates a moving average:
SELECT
District,
Name,
Population,
AVG(Population)
OVER (
ORDER BY Name
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS "Moving Average"
FROM City
WHERE CountryCode = 'AUS';
Result:
+-----------------+---------------+------------+----------------+ | District | Name | Population | Moving Average | +-----------------+---------------+------------+----------------+ | South Australia | Adelaide | 978100 | 1134608.5000 | | Queensland | Brisbane | 1291117 | 787163.3333 | | Queensland | Cairns | 92273 | 568704.3333 | | Capital Region | Canberra | 322723 | 214217.6667 | | New South Wales | Central Coast | 227657 | 225254.0000 | | Victoria | Geelong | 125382 | 221657.0000 | | Queensland | Gold Coast | 311932 | 187810.6667 | | Tasmania | Hobart | 126118 | 1101126.3333 | | Victoria | Melbourne | 2865329 | 1087257.0000 | | New South Wales | Newcastle | 270324 | 1410827.3333 | | West Australia | Perth | 1096829 | 1547786.6667 | | New South Wales | Sydney | 3276207 | 1494316.6667 | | Queensland | Townsville | 109914 | 1201960.6667 | | New South Wales | Wollongong | 219761 | 164837.5000 | +-----------------+---------------+------------+----------------+ 14 rows in set (0.00 sec)
Here we use the SQL AVG()
function with an OVER
clause to create a window function, which allows us to get the moving average from the Population
column.
We use ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
so that the moving average is calculated based on three rows; the row before the current row, the current row, and the row after.
If there are no rows before the current row, then it’s based on two rows (the current row and the following row). If there are no rows after the current row, then it’s based on the current row and the row before. If there are no rows before or after, then it’s based on the current row only (which means we’ll simply get the same value as the current row). The following example demonstrates this.
Moving Average within a Partition
We can also compute the moving average within a partition.
When we create a window function, we have the option of using the PARTITION BY
clause to partition the result set by a specified column. When we do this, the moving averages will be limited to the current partition.
Example:
SELECT
District,
Name,
Population,
AVG(Population)
OVER (
PARTITION BY District
ORDER BY Name
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS "Moving Average"
FROM City
WHERE CountryCode = 'AUS'
ORDER BY District;
Result:
+-----------------+---------------+------------+----------------+ | District | Name | Population | Moving Average | +-----------------+---------------+------------+----------------+ | Capital Region | Canberra | 322723 | 322723.0000 | | New South Wales | Central Coast | 227657 | 248990.5000 | | New South Wales | Newcastle | 270324 | 1258062.6667 | | New South Wales | Sydney | 3276207 | 1255430.6667 | | New South Wales | Wollongong | 219761 | 1747984.0000 | | Queensland | Brisbane | 1291117 | 691695.0000 | | Queensland | Cairns | 92273 | 565107.3333 | | Queensland | Gold Coast | 311932 | 171373.0000 | | Queensland | Townsville | 109914 | 210923.0000 | | South Australia | Adelaide | 978100 | 978100.0000 | | Tasmania | Hobart | 126118 | 126118.0000 | | Victoria | Geelong | 125382 | 1495355.5000 | | Victoria | Melbourne | 2865329 | 1495355.5000 | | West Australia | Perth | 1096829 | 1096829.0000 | +-----------------+---------------+------------+----------------+ 14 rows in set (0.00 sec)
Here, we specified PARTITION BY District
, which partitioned the result set by the District
column. Now, the moving averages are limited to the cities within the current district.
We can easily verify this by looking at the first row. The moving average is exactly the same value as the one in the Population
column for that row. That’s because the current partition has only one row (there’s only one city in that district). Therefore, the moving average is calculated based on one row. The total amount (322723) is divided by the number of rows (1). So it goes like this: 322723 / 1 = 322723.
The first moving average for the New South Wales partition is based on two rows; the current row and the next row. There isn’t a previous row, so that’s why it’s only two rows. So that one is (227657 + 270324) / 2 = 248990.5.
The second moving average for the New South Wales partition is based on three rows; the row previous to the current row, the current row, and the row following the current row. Therefore, it’s (227657 + 270324 + 3276207) / 3 = 1258062.6667.
And so on…