In PostgreSQL, a partial index is an index that’s built on a subset of the rows in a table. It includes only the rows that satisfy a given condition specified with the WHERE
clause.
Partial indexes can help us achieve faster queries, reduced storage, and lower overhead for write-heavy workloads.
Below is a quick example that demonstrates how to create a partial index in PostgreSQL.
Basic Syntax
The basic syntax for creating a partial index goes like this:
CREATE INDEX index_name ON table_name (column_name)
WHERE condition;
index_name
: The name of the index you are creating.table_name
: The name of the table on which the index is being created.column_name
: The column(s) you are indexing.condition
: A condition to limit the rows included in the index.
Example
Suppose we have a tasks
table that tracks the status of tasks. We use a status
column that indicates the current status of each task. And because the database contains a lot of rows, built up over many years of use, most tasks have already been completed, leaving the number of open tasks a relatively small percentage of the overall number of tasks recorded in the database.
And as it turns out, most users are only interested in the open tasks. After all, closed tasks have already been dealt with, so there’s not much need to keep looking them up. Users are constantly running queries to check the open tasks. The open tasks are where it’s at.
This is an ideal situation for creating a partial index; the users are frequently searching through a small subset of the overall data. By creating a partial index on that subset, we’ll help speed up queries, while keeping disk-usage to a minimum.
So given this scenario, we’ll create a partial index to optimize queries for tasks with a status
of ‘Open’.
1. Create the Table
First we’ll create the table:
CREATE TABLE tasks (
task_id SERIAL PRIMARY KEY,
task_name TEXT NOT NULL,
status VARCHAR(20) NOT NULL,
due_date DATE
);
2. Create a Partial Index
Now we’ll create the partial index to improve query performance for ‘Open’ tasks:
CREATE INDEX idx_open_tasks ON tasks (task_name, due_date)
WHERE status = 'Open';
We included task_name
and due_date
in the column list, as those are the columns returned in the popular query.
That’s all that’s required. We’ve just created a partial index.
Verify the Results
Now that we’ve created a partial index, let’s see how to check whether it’s being used when we run a query.
Insert Sample Data
First, let’s insert a bunch of data to give PostgreSQL something to index and to provide us with meaningful results. To efficiently insert thousands of rows, we can use the generate_series()
function along with INSERT INTO ... SELECT
to generate the data.
Here’s a script that inserts 10,000 rows with varying statuses:
INSERT INTO tasks (task_name, status, due_date)
SELECT
'Task ' || gs,
CASE
WHEN gs % 3 = 0 THEN 'Closed'
WHEN gs % 3 = 1 THEN 'Open'
ELSE 'Ignored'
END,
CURRENT_DATE + (gs % 30)
FROM generate_series(1, 10000) AS gs;
Check the Execution Plan
We can use the EXPLAIN
command to check if the partial index is being utilized when we run a relevant query:
EXPLAIN ANALYZE
SELECT
task_name,
due_date
FROM tasks
WHERE status = 'Open';
Output:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_open_tasks on tasks (cost=0.28..110.29 rows=3334 width=13) (actual time=0.088..0.552 rows=3334 loops=1)
Heap Fetches: 0
Planning Time: 0.944 ms
Execution Time: 0.752 ms
The EXPLAIN
command shows the execution plan of a statement. The execution plan shows how the table(s) referenced by the statement will be scanned (e.g. by plain sequential scan, index scan, etc), as well as what join algorithms will be used to bring together the required rows from each input table in the event that multiple tables are referenced.
In our simple case, we can see that it used an Index Only Scan, and we can see that it names the index that we created. It scanned 3,334 rows, which is how many tasks are ‘Open’. If this table contained millions of rows, but there were only a few thousand open tasks, then this could represent a significant performance boost. Our partial index would save PostgreSQL the effort of having to search through millions of rows, just to find a few thousand open tasks.
Check the Disk Usage
We can use the \di+
command to list our indexes, along with other details, such as disk space usage.
Create full index:
Let’s first create a full index, so that we can compare the size difference between the full index and the partial index.
CREATE INDEX idx_all_tasks ON tasks (task_name, due_date);
This indexes the same columns as the partial index. The only difference is that it doesn’t filter out any rows – it indexes all rows.
Check the disk usage:
Now we can use the \di+
command to check the disk space of both indexes:
\di+ *tasks
Output:
List of relations
Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description
--------+----------------+-------+--------+-------+-------------+---------------+--------+-------------
public | idx_all_tasks | index | barney | tasks | permanent | btree | 328 kB |
public | idx_open_tasks | index | barney | tasks | permanent | btree | 120 kB |
We can see how much bigger the full index is compared to the partial index. Imagine if there were millions of rows and open tasks were just a tiny fraction – imagine the space savings we’d be able to achieve.
While we’re at it, let’s check to see how much space the table is using:
\dt+ tasks
Output:
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------+-------+--------+-------------+---------------+--------+-------------
public | tasks | table | barney | permanent | heap | 600 kB |
It’s even larger than the full index.
So to summarize, creating a partial index can be a great option for improving the efficiency of our databases – both the queries and the storage.