Using the NTILE() Function to Divide a Partition into Buckets in PostgreSQL

In PostgreSQL, the ntile() function is a window function that divides a partition into the specified number of groups (buckets), distributing the rows as equally as possible, and returns the bucket number of the current row within its partition.

Syntax

The syntax goes like this:

ntile ( num_buckets ) over_clause

Where num_buckets is an integer that represents the number of buckets to divide the partition into.

The function also requires an OVER clause, which determines how the rowset is partitioned and ordered before the window function is applied.

Example

Here’s a basic example to demonstrate how it works:

SELECT 
    vendorid,
    productname,
    productprice,
    ntile( 4 ) 
        OVER ( ORDER BY productprice ) 
    AS "ntile"
FROM products;

Result:

 vendorid |           productname           | productprice | ntile 
----------+---------------------------------+--------------+-------
     1004 | Bottomless Coffee Mugs (4 Pack) |         9.99 |     1
     1003 | Hammock                         |        10.00 |     1
     1001 | Long Weight (green)             |        11.99 |     1
     1004 | Tea Pot                         |        12.45 |     2
     1001 | Long Weight (blue)              |        14.75 |     2
     1001 | Left handed screwdriver         |        25.99 |     2
     1001 | Right handed screwdriver        |        25.99 |     3
     1002 | Sledge Hammer                   |        33.49 |     3
     1003 | Straw Dog Box                   |        55.99 |     4
     1003 | Chainsaw                        |       245.00 |     4
(10 rows)

In this example I specified four buckets. We haven’t applied any partitioning here, and so the whole rowset is divided into the four buckets.

PostgreSQL divides the partitions as equally as possible. In this case there are ten rows, and so we can’t get an equal amount of rows in each bucket. In this case PostgreSQL has decided to put three rows into the first two buckets, and two rows into the remaining two buckets.

Partitioning

This example uses a PARTITION BY clause to partition the results by activity:

SELECT 
    v.vendorname,
    p.productname,
    p.productprice,
    ntile( 2 ) OVER ( 
        PARTITION BY p.vendorid 
        ORDER BY p.productprice
        )
FROM products p INNER JOIN 
vendors v ON p.vendorid = v.vendorid;

Result:

  vendorname   |           productname           | productprice | ntile 
---------------+---------------------------------+--------------+-------
 Mars Supplies | Long Weight (green)             |        11.99 |     1
 Mars Supplies | Long Weight (blue)              |        14.75 |     1
 Mars Supplies | Right handed screwdriver        |        25.99 |     2
 Mars Supplies | Left handed screwdriver         |        25.99 |     2
 Randy Roofers | Sledge Hammer                   |        33.49 |     1
 Pedal Medals  | Hammock                         |        10.00 |     1
 Pedal Medals  | Straw Dog Box                   |        55.99 |     1
 Pedal Medals  | Chainsaw                        |       245.00 |     2
 Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |     1
 Katty Kittens | Tea Pot                         |        12.45 |     2
(10 rows)

This time I specified two buckets per partition. In this case, the bucket count resets with each partition.

Changing the Order

Here’s what happens when we change the order to descending order:

SELECT 
    v.vendorname,
    p.productname,
    p.productprice,
    ntile( 2 ) OVER ( 
        PARTITION BY p.vendorid 
        ORDER BY p.productprice DESC
        )
FROM products p INNER JOIN 
vendors v ON p.vendorid = v.vendorid;

Result:

  vendorname   |           productname           | productprice | ntile 
---------------+---------------------------------+--------------+-------
 Mars Supplies | Left handed screwdriver         |        25.99 |     1
 Mars Supplies | Right handed screwdriver        |        25.99 |     1
 Mars Supplies | Long Weight (blue)              |        14.75 |     2
 Mars Supplies | Long Weight (green)             |        11.99 |     2
 Randy Roofers | Sledge Hammer                   |        33.49 |     1
 Pedal Medals  | Chainsaw                        |       245.00 |     1
 Pedal Medals  | Straw Dog Box                   |        55.99 |     1
 Pedal Medals  | Hammock                         |        10.00 |     2
 Katty Kittens | Tea Pot                         |        12.45 |     1
 Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |     2
(10 rows)

The different ordering causes some rows to be assigned a different bucket number.

Let’s try it with again with both ascending and descending order in the same query:

SELECT 
    v.vendorname,
    p.productname,
    p.productprice,
    ntile( 2 ) OVER ( 
        PARTITION BY p.vendorid 
        ORDER BY p.productprice ASC
        ),
    ntile( 2 ) OVER ( 
        PARTITION BY p.vendorid 
        ORDER BY p.productprice DESC
        )
FROM products p INNER JOIN 
vendors v ON p.vendorid = v.vendorid;

Result:

  vendorname   |           productname           | productprice | ntile | ntile 
---------------+---------------------------------+--------------+-------+-------
 Mars Supplies | Long Weight (green)             |        11.99 |     1 |     2
 Mars Supplies | Long Weight (blue)              |        14.75 |     1 |     2
 Mars Supplies | Right handed screwdriver        |        25.99 |     2 |     1
 Mars Supplies | Left handed screwdriver         |        25.99 |     2 |     1
 Randy Roofers | Sledge Hammer                   |        33.49 |     1 |     1
 Pedal Medals  | Hammock                         |        10.00 |     1 |     2
 Pedal Medals  | Straw Dog Box                   |        55.99 |     1 |     1
 Pedal Medals  | Chainsaw                        |       245.00 |     2 |     1
 Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |     1 |     2
 Katty Kittens | Tea Pot                         |        12.45 |     2 |     1
(10 rows)

Adding a Named Window

We can use a named window to make our code more concise:

SELECT 
    v.vendorname,
    p.productname,
    p.productprice,
    ntile( 2 ) OVER ( 
        win
        ORDER BY p.productprice ASC
        ),
    ntile( 2 ) OVER ( 
        win
        ORDER BY p.productprice DESC
        )
FROM products p INNER JOIN 
vendors v ON p.vendorid = v.vendorid
WINDOW win AS (PARTITION BY p.vendorid);

Result:

  vendorname   |           productname           | productprice | ntile | ntile 
---------------+---------------------------------+--------------+-------+-------
 Mars Supplies | Long Weight (green)             |        11.99 |     1 |     2
 Mars Supplies | Long Weight (blue)              |        14.75 |     1 |     2
 Mars Supplies | Right handed screwdriver        |        25.99 |     2 |     1
 Mars Supplies | Left handed screwdriver         |        25.99 |     2 |     1
 Randy Roofers | Sledge Hammer                   |        33.49 |     1 |     1
 Pedal Medals  | Hammock                         |        10.00 |     1 |     2
 Pedal Medals  | Straw Dog Box                   |        55.99 |     1 |     1
 Pedal Medals  | Chainsaw                        |       245.00 |     2 |     1
 Katty Kittens | Bottomless Coffee Mugs (4 Pack) |         9.99 |     1 |     2
 Katty Kittens | Tea Pot                         |        12.45 |     2 |     1
(10 rows)

I used the WINDOW clause to create the named window. In this case I put the PARTITION BY clause in the named window, but I kept both ORDER BY clauses in the SELECT list, due to the fact that each column is sorted in a different order.

Omitting the Bucket Count

Omitting the bucket count results in an error:

SELECT 
    vendorid,
    productname,
    productprice,
    ntile( ) 
        OVER ( ORDER BY productprice ) 
    AS "ntile"
FROM products;

Result:

ERROR:  function ntile() does not exist
LINE 5:     ntile( ) 
            ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Although the initial error tells us that the ntile() function doesn’t exist, the hint tells specifies that “No function matches the given name and argument types”. So we’re getting an error due to the argument being the wrong type, although we haven’t actually supplied an argument.

Regardless, we just need to make sure we provide an argument that specifies the number of buckets we require.