Understanding Partial Indexes in PostgreSQL

Adding indexes to a table is a well established practice for speeding up regular queries in relational databases such as PostgreSQL. While they can do wonders for query performance, indexes can also take up a lot of disk space.

Today we’re going to be looking at a particular type of index that can help reduce the impact on disk space, and query performance – the partial index.

What is a Partial Index in PostgreSQL?

A partial index in PostgreSQL is an index built on a subset of the rows in a table. The index includes only the rows that satisfy a given condition (called the predicate of the partial index), which is specified using a WHERE clause at the time the index is created. In contrast to a partial index, a full index covers all rows in the table, without filtering based on a condition.

Partial indexes can make queries more efficient because the index is smaller and only applies to the relevant portion of the data. The result of this can be significantly less disk space usage.

Partial indexes can be particularly beneficial when:

  • Only a subset of the data is frequently queried.
  • The condition used in the partial index aligns with common query patterns.

“Partial index” is the term used in PostgreSQL and SQLite. In other DBMSs we might refer to them as a filtered index. These are indexes where the subset is defined by a conditional expression.

There’s also the functional index or expression-based index which provides similar functionality, but is based on a function or scalar expression computed from one or more columns of the table.

Example of a Partial Index

Consider a table orders that tracks various orders, where each order can be in different states like pending, shipped, or delivered.

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_status VARCHAR(20),
    order_date DATE
);

Suppose you often query for orders that are still pending. A regular index on order_status would include all rows, regardless of their status. Instead, you can create a partial index like this:

CREATE INDEX idx_pending_orders
ON orders (order_date)
WHERE order_status = 'pending';

In this example, the index will only include rows where order_status is 'pending', meaning queries that look for pending orders (e.g., sorting or filtering by order_date) will benefit from this smaller, more focused index.

This partial index would be particularly beneficial if the number of pending orders represents just a small fraction of the table’s contents. That’s because, PostgreSQL doesn’t need to look through all the irrelevant (i.e. non-pending) orders just to find the pending ones. The index contains only pending orders.

Full Index Comparison

As mentioned, a full index covers all rows in the table, without filtering based on a condition.

So a full index on the order_date column of the orders table might look like this:

CREATE INDEX idx_full_order_date
ON orders (order_date);

It’s essentially the same as our partial index example, except that it doesn’t have the WHERE clause (so we’re not filtering out any rows).

This index includes every row in the table, regardless of the value of order_date or any other column. Therefore, this index contains all orders, pending or not. If there are a lot of non-pending orders, then this index would most likely take up a lot more disk space than the partial index example. It would probably take longer to search through than the partial index too, not to mention take longer to update (when data changes), etc.

Benefits of Partial Indexes

Partial indexes offer several benefits, particularly when applied in the right scenarios. The main benefits include faster queries, reduced storage, and lower overhead for write-heavy workloads.

Let’s explore some of the main advantages of partial indexes.

Improved Query Performance

Partial indexes can significantly improve query performance, especially when your queries only focus on a specific subset of the data. By indexing only the rows that meet a particular condition, the index is smaller and more efficient to search through.

Reduced Storage Requirements

Since partial indexes only include a subset of the table’s rows, they require less storage space compared to full indexes. This can be particularly advantageous for large tables where indexing every row would consume a significant amount of disk space.

Benefit: Reducing the size of indexes can lead to more efficient use of system resources, especially on systems with limited storage or where disk I/O is a bottleneck.

Lower Maintenance Overhead

Partial indexes reduce the long-term effort required to maintain an index, as they only cover a subset of rows. Over time, as the table grows, the system performs less index-related maintenance (such as vacuuming or rebuilding) because fewer rows are included in the index. This can lead to lower resource usage and improved performance for background tasks that keep the index healthy.

Long-Term Impact: Less frequent or less intensive index-related maintenance tasks, which means the database spends fewer resources on keeping the index optimized over time.

Faster Writes (Inserts, Updates, Deletes)

During insert, update, or delete operations, partial indexes improve performance by limiting the rows that trigger index updates. When fewer rows are eligible for the index, PostgreSQL spends less time maintaining the index for each write operation, leading to faster execution of these data modifications.

Immediate Benefit: Write operations are faster because the system doesn’t need to update the index for rows that don’t match the partial index condition.

Example: If a table logs different types of events and only errors are indexed for frequent querying, inserting other event types (e.g., warnings or informational messages) won’t trigger an index update:

   CREATE INDEX idx_error_logs
   ON logs (event_time)
   WHERE event_type = 'error';

Optimized Query Planner Decisions

PostgreSQL’s query planner will decide whether or not to use an index based on various factors, including index size and selectivity. A smaller partial index can sometimes be more attractive to the planner compared to a full index, leading to more efficient query execution.

If a query planner sees that a full index spans a large number of rows that aren’t relevant to a specific query, it may choose not to use the index. A partial index focused on the relevant data subset is more likely to be used efficiently.

Targeted Optimization for Specific Query Patterns

Partial indexes allow you to tailor indexing strategies to very specific query patterns. This enables fine-tuned optimization for cases where only a subset of rows or a particular condition is frequently queried.

Use-Cases for Partial Indexes

Here are some possible use-cases that could benefit from partial indexes:

  • Frequent Queries on a Subset of Data: When you query a specific subset of data frequently, partial indexes can improve performance. For example, if you often query for active users, you can index only the rows where status = 'active'.
  • Sparse Data: If a column has many NULL values and you are only interested in indexing non-NULL values, a partial index can help. For example:
   CREATE INDEX idx_non_null_email
   ON users (email)
   WHERE email IS NOT NULL;
  • Selective Data Changes: In scenarios where only part of the data changes frequently (e.g., new data being inserted for a certain category), a partial index can reduce the overhead of maintaining a full index.
  • Multi-Tenant Applications: If you’re managing data for multiple tenants and only one tenant’s data is frequently queried, you can create partial indexes per tenant.
   CREATE INDEX idx_tenant_a_data
   ON data (column)
   WHERE tenant_id = 'tenant_a';
  • Data Archiving: When dealing with historical data, partial indexes can target recent or “active” data, leaving archived records unindexed:
   CREATE INDEX idx_recent_data
   ON data_table (date_column)
   WHERE date_column > '2023-01-01';

When Not to Use a Partial Index

As good as partial indexes can be, there are plenty of scenarios where they shouldn’t be used. Here are some cases where you probably wouldn’t want to create a partial index:

  • Queries Don’t Match the Index Condition: If your queries don’t filter data based on the condition specified in the WHERE clause of the partial index, the index won’t be used. For example, if your queries include rows that are both 'pending' and 'shipped', the index on 'pending' orders won’t help.
  • Low Selectivity: If a large proportion of rows meet the condition in the WHERE clause, a partial index won’t offer much benefit. In such cases, a full index might be more appropriate.
  • Data Patterns Change Over Time: If your data patterns are volatile, such that the condition in the WHERE clause is no longer representative of your query patterns, you may need to frequently recreate or adjust the index.
  • Complex Queries: If your queries use complex joins or aggregate functions that don’t align well with the condition in the partial index, the index may not be helpful.
  • Maintenance Costs: Partial indexes still require maintenance. While they are generally smaller, they need to be updated as data changes, meaning in tables with heavy inserts, deletes, or updates, there could still be maintenance overhead. This is true whether the index is partial or full. In some situations, particularly in write-heavy workloads, it might be better to avoid having any indexes at all (partial or not), as the cost of maintaining the index can outweigh its benefits.

Conclusion

Partial indexes in PostgreSQL can be particularly beneficial when applied in the right situations. They work best when queries target a specific subset of data and can significantly reduce storage and maintenance overhead compared to full indexes. However, they should be avoided in scenarios where data patterns don’t fit a narrow condition or where the condition is too broad to offer meaningful performance improvements.