In PostgreSQL, width_bucket()
is a mathematical function that assigns values to buckets (individual segments) in an equiwidth histogram.
The return type is int.
Syntax
The function can be used with any of the three following syntaxes:
width_bucket(operand dp, b1 dp, b2 dp, count int)
width_bucket(operand numeric, b1 numeric, b2 numeric, count int)
width_bucket(operand anyelement, thresholds anyarray)
The first two are basically the same, except that they use different data types (double precision vs numeric).
These three syntaxes are explained below.
width_bucket(operand dp, b1 dp, b2 dp, count int)
- Returns the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range.
width_bucket(operand numeric, b1 numeric, b2 numeric, count int)
- Returns the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range.
width_bucket(operand anyelement, thresholds anyarray)
- Returns the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained.
Example – First/Second Syntax
As mentioned, the first two syntaxes are basically the same, except that they outline the different data types (double precision vs numeric).
Here’s an example to demonstrate how the first two syntaxes work.
SELECT
width_bucket(3, 1, 12, 3),
width_bucket(5, 1, 12, 3),
width_bucket(9, 1, 12, 3);
Result:
width_bucket | width_bucket | width_bucket --------------+--------------+-------------- 1 | 2 | 3
Here’s an explanation. Let’s examine each argument, starting from the last one and working backwards to the first one.
- Fourth argument: I specify three buckets. I do this by using 3 as the fourth argument.
- Second and third arguments: I specify that the range is between 1 and 12. In this case, my second argument is 1 and third argument is 12.
- First argument: This value is compared against the second and third arguments, so as to know which of the three buckets it should be assigned to. In my example I call
width_bucket()
three times in order to illustrate the concept better. I do this so that I can provide three different values as the first argument, each of which is assigned to a different bucket.
The following table provides another way of visualising this:
Values | Bucket |
---|---|
1, 2, 3, 4 | Bucket 1 |
5, 6, 7, 8 | Bucket 2 |
9, 10, 11, 12 | Bucket 3 |
So we can see that the first bucket accepts values from between 1 and 4, the second bucket between 5 and 8, and the third bucket is for values between 9 and 12.
If I were to change it so that there were four buckets, my code might look something like this:
SELECT
width_bucket(3, 1, 12, 4),
width_bucket(5, 1, 12, 4),
width_bucket(9, 1, 12, 4);
And the table would look like this:
Values | Bucket |
---|---|
1, 2, 3 | Bucket 1 |
4, 5, 6 | Bucket 2 |
7, 8, 9 | Bucket 3 |
10, 11, 12 | Bucket 4 |
Out of Range
If the input is outside the range of the bucket, you’ll get either 0 or count+1, depending on whether the input is below the range or above it.
Example:
SELECT
width_bucket(-3, 1, 12, 3),
width_bucket(20, 1, 12, 3);
Result:
width_bucket | width_bucket --------------+-------------- 0 | 4
Example – Third Syntax
In order to demonstrate the third syntax, let’s take the first example above, and modify it to use the third syntax:
SELECT
width_bucket(3, array[1, 4, 8]),
width_bucket(5, array[1, 4, 8]),
width_bucket(9, array[1, 4, 8]);
Result:
width_bucket | width_bucket | width_bucket --------------+--------------+-------------- 1 | 2 | 3
Here I created 3 buckets and assigned explicit values to each one. In this case they are all equal-width buckets, but that’s not a requirement.
A major benefit of the the third syntax is that it allows you to create buckets of unequal width.
For example, I could modify the previous example to this:
SELECT
width_bucket(3, array[1, 3, 12]),
width_bucket(5, array[1, 3, 12]),
width_bucket(9, array[1, 3, 12]);
Result:
width_bucket | width_bucket | width_bucket --------------+--------------+-------------- 2 | 2 | 2
Doing this changes the buckets that each number is assigned to. Now, all of those numbers belong in the second bucket.
The third syntax could be useful for a variety of use cases. For example, you might have a range of age groups that aren’t equally distributed.
SELECT
width_bucket(15, array[10, 18, 30, 50, 65]) AS "Age Group (15)",
width_bucket(45, array[10, 18, 30, 50, 65]) AS "Age Group (45)",
width_bucket(50, array[10, 18, 30, 50, 65]) AS "Age Group (50)";
Result:
Age Group (15) | Age Group (45) | Age Group (50) ----------------+----------------+---------------- 1 | 3 | 4
Out of Range
The function returns 0 if the input is less than the first lower bound.
Example:
SELECT width_bucket(8, array[10, 40, 30]);
Result:
0
Width_Bucket() vs CASE
The examples on this page could also be done using a CASE
statement. The difference is that width_bucket()
does it in a more concise way.
Here’s how we could rewrite the previous example using a CASE
statement.
SELECT
CASE
WHEN 8 BETWEEN 0 AND 9 THEN 0
WHEN 8 BETWEEN 10 AND 39 THEN 1
WHEN 8 BETWEEN 40 AND 49 THEN 2
ELSE 3
END;
Result:
0
Bear in mind that the input in all these examples would normally be a variable or column name rather than a constant.