Compute the Number of Items within a Range When Using a Window Function in SQL

In SQL, we can use the following method to compute the number of items within a given range when using a window function.

By this I mean we can use a frame clause in the window function to output how many items have a value that’s in a similar range to the value in the current row.

Example

Here’s an example to demonstrate:

SELECT
    Name,
    Population,
    COUNT(*) 
        OVER (
            ORDER BY Population
            RANGE BETWEEN 20000 PRECEDING AND 20000 FOLLOWING
            ) AS "Similar Population"
FROM City
WHERE CountryCode = 'NZL';

Result:

+--------------+------------+--------------------+
| Name         | Population | Similar Population |
+--------------+------------+--------------------+
| Lower Hutt   |      98100 |                  2 |
| Hamilton     |     117100 |                  3 |
| Dunedin      |     119600 |                  2 |
| Wellington   |     166700 |                  2 |
| Waitakere    |     170600 |                  3 |
| North Shore  |     187700 |                  2 |
| Manukau      |     281800 |                  1 |
| Christchurch |     324200 |                  1 |
| Auckland     |     381800 |                  1 |
+--------------+------------+--------------------+
9 rows in set (0.00 sec)

Here we use the SQL SUM() function with an OVER clause to create a window function that allows us to get the data we want from the Population column.

In this case, the data we want is; how many cities have a population within a certain range of the current city’s population.

Note that this includes the current city. So if there are no other cities with a similar population, then 1 is returned.

To get the above result I provided a frame clause that goes like this: RANGE BETWEEN 20000 PRECEDING AND 20000 FOLLOWING. The RANGE clause means that the frame is defined by rows within a value range. Offsets are differences in row values from the current row value. So in our case, 20000 PRECEDING refers to rows that have a value that’s no more than 20000 less than the value in the current row. Likewise, 20000 FOLLOWING refers to rows that have a value that’s no more than 20000 more than the value in the current row.

We can change the values as desired.

Example:

SELECT
    Name,
    Population,
    COUNT(*) 
        OVER (
            ORDER BY Population
            RANGE BETWEEN 20000 PRECEDING AND 10000 FOLLOWING
            ) AS "Similar Population"
FROM City
WHERE CountryCode = 'NZL';

Result:

+--------------+------------+--------------------+
| Name         | Population | Similar Population |
+--------------+------------+--------------------+
| Lower Hutt   |      98100 |                  1 |
| Hamilton     |     117100 |                  3 |
| Dunedin      |     119600 |                  2 |
| Wellington   |     166700 |                  2 |
| Waitakere    |     170600 |                  2 |
| North Shore  |     187700 |                  2 |
| Manukau      |     281800 |                  1 |
| Christchurch |     324200 |                  1 |
| Auckland     |     381800 |                  1 |
+--------------+------------+--------------------+
9 rows in set (0.00 sec)

Here I changed 20000 FOLLOWING to 10000 FOLLOWING (but I left 20000 PRECEDING alone).

The result has changed slightly. For example, the first row now returns 1 instead of 2. That’s because there are no other cities that have a population within the new range.

Using Partitions

When we create a window function, we have the option of using the PARTITION BY clause to partition the result set by a specified column. When we do this, the comparison is made to just those rows in the current partition.

Example:

SELECT
    District,
    Name,
    Population,
    COUNT(*) 
        OVER (
            PARTITION BY District
            ORDER BY Population
            RANGE BETWEEN 300000 PRECEDING AND 300000 FOLLOWING
            ) AS "Similar Population"
FROM City
WHERE CountryCode = 'AUS';

Result:

+-----------------+---------------+------------+--------------------+
| District        | Name          | Population | Similar Population |
+-----------------+---------------+------------+--------------------+
| Capital Region  | Canberra      |     322723 |                  1 |
| New South Wales | Wollongong    |     219761 |                  3 |
| New South Wales | Central Coast |     227657 |                  3 |
| New South Wales | Newcastle     |     270324 |                  3 |
| New South Wales | Sydney        |    3276207 |                  1 |
| Queensland      | Cairns        |      92273 |                  3 |
| Queensland      | Townsville    |     109914 |                  3 |
| Queensland      | Gold Coast    |     311932 |                  3 |
| Queensland      | Brisbane      |    1291117 |                  1 |
| South Australia | Adelaide      |     978100 |                  1 |
| Tasmania        | Hobart        |     126118 |                  1 |
| Victoria        | Geelong       |     125382 |                  1 |
| Victoria        | Melbourne     |    2865329 |                  1 |
| West Australia  | Perth         |    1096829 |                  1 |
+-----------------+---------------+------------+--------------------+
14 rows in set (0.00 sec)

Here, we specified PARTITION BY District, which partitioned the result set by the District column. Now, the comparisons are made to just those cities within the current district.

We can verify this by comparing the first row with the next three rows. I increased the lower range to 300,000, so all of the cities in the next three rows would normally be a match. However, we’ve partitioned the results by district, and therefore there’s no match.