Create a Partial Index in PostgreSQL

In PostgreSQL, a partial index is an index that’s built on a subset of the rows in a table. It includes only the rows that satisfy a given condition specified with the WHERE clause.

Partial indexes can help us achieve faster queries, reduced storage, and lower overhead for write-heavy workloads.

Below is a quick example that demonstrates how to create a partial index in PostgreSQL.

Basic Syntax

The basic syntax for creating a partial index goes like this:

CREATE INDEX index_name ON table_name (column_name)
WHERE condition;
  • index_name: The name of the index you are creating.
  • table_name: The name of the table on which the index is being created.
  • column_name: The column(s) you are indexing.
  • condition: A condition to limit the rows included in the index.

Example

Suppose we have a tasks table that tracks the status of tasks. We use a status column that indicates the current status of each task. And because the database contains a lot of rows, built up over many years of use, most tasks have already been completed, leaving the number of open tasks a relatively small percentage of the overall number of tasks recorded in the database.

And as it turns out, most users are only interested in the open tasks. After all, closed tasks have already been dealt with, so there’s not much need to keep looking them up. Users are constantly running queries to check the open tasks. The open tasks are where it’s at.

This is an ideal situation for creating a partial index; the users are frequently searching through a small subset of the overall data. By creating a partial index on that subset, we’ll help speed up queries, while keeping disk-usage to a minimum.

So given this scenario, we’ll create a partial index to optimize queries for tasks with a status of ‘Open’.

1. Create the Table

First we’ll create the table:

CREATE TABLE tasks (
    task_id SERIAL PRIMARY KEY,
    task_name TEXT NOT NULL,
    status VARCHAR(20) NOT NULL,
    due_date DATE
);

2. Create a Partial Index

Now we’ll create the partial index to improve query performance for ‘Open’ tasks:

CREATE INDEX idx_open_tasks ON tasks (task_name, due_date)
WHERE status = 'Open';

We included task_name and due_date in the column list, as those are the columns returned in the popular query.

That’s all that’s required. We’ve just created a partial index.

Verify the Results

Now that we’ve created a partial index, let’s see how to check whether it’s being used when we run a query.

Insert Sample Data

First, let’s insert a bunch of data to give PostgreSQL something to index and to provide us with meaningful results. To efficiently insert thousands of rows, we can use the generate_series() function along with INSERT INTO ... SELECT to generate the data.

Here’s a script that inserts 10,000 rows with varying statuses:

INSERT INTO tasks (task_name, status, due_date)
SELECT 
    'Task ' || gs, 
    CASE
        WHEN gs % 3 = 0 THEN 'Closed'
        WHEN gs % 3 = 1 THEN 'Open'
        ELSE 'Ignored'
    END,
    CURRENT_DATE + (gs % 30)
FROM generate_series(1, 10000) AS gs;

Check the Execution Plan

We can use the EXPLAIN command to check if the partial index is being utilized when we run a relevant query:

EXPLAIN ANALYZE
SELECT 
    task_name, 
    due_date 
FROM tasks
WHERE status = 'Open';

Output:

                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_open_tasks on tasks (cost=0.28..110.29 rows=3334 width=13) (actual time=0.088..0.552 rows=3334 loops=1)
Heap Fetches: 0
Planning Time: 0.944 ms
Execution Time: 0.752 ms

The EXPLAIN command shows the execution plan of a statement. The execution plan shows how the table(s) referenced by the statement will be scanned (e.g. by plain sequential scan, index scan, etc), as well as what join algorithms will be used to bring together the required rows from each input table in the event that multiple tables are referenced.

In our simple case, we can see that it used an Index Only Scan, and we can see that it names the index that we created. It scanned 3,334 rows, which is how many tasks are ‘Open’. If this table contained millions of rows, but there were only a few thousand open tasks, then this could represent a significant performance boost. Our partial index would save PostgreSQL the effort of having to search through millions of rows, just to find a few thousand open tasks.

Check the Disk Usage

We can use the \di+ command to list our indexes, along with other details, such as disk space usage.

Create full index:

Let’s first create a full index, so that we can compare the size difference between the full index and the partial index.

CREATE INDEX idx_all_tasks ON tasks (task_name, due_date);

This indexes the same columns as the partial index. The only difference is that it doesn’t filter out any rows – it indexes all rows.

Check the disk usage:

Now we can use the \di+ command to check the disk space of both indexes:

\di+ *tasks

Output:

                                           List of relations
Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description
--------+----------------+-------+--------+-------+-------------+---------------+--------+-------------
public | idx_all_tasks | index | barney | tasks | permanent | btree | 328 kB |
public | idx_open_tasks | index | barney | tasks | permanent | btree | 120 kB |

We can see how much bigger the full index is compared to the partial index. Imagine if there were millions of rows and open tasks were just a tiny fraction – imagine the space savings we’d be able to achieve.

While we’re at it, let’s check to see how much space the table is using:

\dt+ tasks

Output:

                                  List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------+-------+--------+-------------+---------------+--------+-------------
public | tasks | table | barney | permanent | heap | 600 kB |

It’s even larger than the full index.

So to summarize, creating a partial index can be a great option for improving the efficiency of our databases – both the queries and the storage.