In SQL, we can use the following method to compute the number of items within a given range when using a window function.
By this I mean we can use a frame clause in the window function to output how many items have a value that’s in a similar range to the value in the current row.
Example
Here’s an example to demonstrate:
SELECT
Name,
Population,
COUNT(*)
OVER (
ORDER BY Population
RANGE BETWEEN 20000 PRECEDING AND 20000 FOLLOWING
) AS "Similar Population"
FROM City
WHERE CountryCode = 'NZL';
Result:
+--------------+------------+--------------------+ | Name | Population | Similar Population | +--------------+------------+--------------------+ | Lower Hutt | 98100 | 2 | | Hamilton | 117100 | 3 | | Dunedin | 119600 | 2 | | Wellington | 166700 | 2 | | Waitakere | 170600 | 3 | | North Shore | 187700 | 2 | | Manukau | 281800 | 1 | | Christchurch | 324200 | 1 | | Auckland | 381800 | 1 | +--------------+------------+--------------------+ 9 rows in set (0.00 sec)
Here we use the SQL SUM()
function with an OVER
clause to create a window function that allows us to get the data we want from the Population
column.
In this case, the data we want is; how many cities have a population within a certain range of the current city’s population.
Note that this includes the current city. So if there are no other cities with a similar population, then 1
is returned.
To get the above result I provided a frame clause that goes like this: RANGE BETWEEN 20000 PRECEDING AND 20000 FOLLOWING
. The RANGE
clause means that the frame is defined by rows within a value range. Offsets are differences in row values from the current row value. So in our case, 20000 PRECEDING
refers to rows that have a value that’s no more than 20000 less than the value in the current row. Likewise, 20000 FOLLOWING
refers to rows that have a value that’s no more than 20000 more than the value in the current row.
We can change the values as desired.
Example:
SELECT
Name,
Population,
COUNT(*)
OVER (
ORDER BY Population
RANGE BETWEEN 20000 PRECEDING AND 10000 FOLLOWING
) AS "Similar Population"
FROM City
WHERE CountryCode = 'NZL';
Result:
+--------------+------------+--------------------+ | Name | Population | Similar Population | +--------------+------------+--------------------+ | Lower Hutt | 98100 | 1 | | Hamilton | 117100 | 3 | | Dunedin | 119600 | 2 | | Wellington | 166700 | 2 | | Waitakere | 170600 | 2 | | North Shore | 187700 | 2 | | Manukau | 281800 | 1 | | Christchurch | 324200 | 1 | | Auckland | 381800 | 1 | +--------------+------------+--------------------+ 9 rows in set (0.00 sec)
Here I changed 20000 FOLLOWING
to 10000 FOLLOWING
(but I left 20000 PRECEDING
alone).
The result has changed slightly. For example, the first row now returns 1
instead of 2
. That’s because there are no other cities that have a population within the new range.
Using Partitions
When we create a window function, we have the option of using the PARTITION BY
clause to partition the result set by a specified column. When we do this, the comparison is made to just those rows in the current partition.
Example:
SELECT
District,
Name,
Population,
COUNT(*)
OVER (
PARTITION BY District
ORDER BY Population
RANGE BETWEEN 300000 PRECEDING AND 300000 FOLLOWING
) AS "Similar Population"
FROM City
WHERE CountryCode = 'AUS';
Result:
+-----------------+---------------+------------+--------------------+ | District | Name | Population | Similar Population | +-----------------+---------------+------------+--------------------+ | Capital Region | Canberra | 322723 | 1 | | New South Wales | Wollongong | 219761 | 3 | | New South Wales | Central Coast | 227657 | 3 | | New South Wales | Newcastle | 270324 | 3 | | New South Wales | Sydney | 3276207 | 1 | | Queensland | Cairns | 92273 | 3 | | Queensland | Townsville | 109914 | 3 | | Queensland | Gold Coast | 311932 | 3 | | Queensland | Brisbane | 1291117 | 1 | | South Australia | Adelaide | 978100 | 1 | | Tasmania | Hobart | 126118 | 1 | | Victoria | Geelong | 125382 | 1 | | Victoria | Melbourne | 2865329 | 1 | | West Australia | Perth | 1096829 | 1 | +-----------------+---------------+------------+--------------------+ 14 rows in set (0.00 sec)
Here, we specified PARTITION BY District
, which partitioned the result set by the District
column. Now, the comparisons are made to just those cities within the current district.
We can verify this by comparing the first row with the next three rows. I increased the lower range to 300,000, so all of the cities in the next three rows would normally be a match. However, we’ve partitioned the results by district, and therefore there’s no match.