Indexes on Expressions in PostgreSQL: A Quick Overview

In PostgreSQL, indexes on expressions (also known as “functional indexes”) allow you to create an index not just on a column but on the result of an expression or function applied to a column (or multiple columns). This can significantly improve query performance when the same expression is used frequently in queries, especially for complex calculations or transformations.

How Indexes on Expressions Work

An expression index stores the results of a function or calculation rather than the raw column values. Whenever you run a query with the same expression, PostgreSQL can use the index to quickly find the precomputed results instead of recalculating them on the fly.

For example, if you often search based on a function applied to a column, such as lower(name), an index on that expression can make the queries much faster. Such an index might look something like this:

CREATE INDEX idx_users_name ON users (lower(name));

So the index definition consists solely of the expression.

Example of an Index on an Expression

Let’s create an example where we index on an expression that concatenates two columns. This approach will show how an expression index can optimize queries that use string concatenation.

Scenario

Suppose we have a customers table with the following columns:

  • first_name: The customer’s first name
  • last_name: The customer’s last name

And suppose users often query the table for the full name (which is a concatenation of first_name and last_name), like this:

SELECT * 
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';

We can create an expression-based index on the concatenation of these columns to optimize such queries.

Create the Table and Data

First, let’s create the table and populate it with dummy data:

-- Step 1: Create the customers table
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);

-- Step 2: Insert 10,000 rows of dummy data
DO
$$
DECLARE
    first_names TEXT[] := ARRAY['Homer', 'Peter', 'Marge', 'Bart', 'Hector', 'Eve', 'Beth', 'Steve', 'Hesham', 'Trudy'];
    last_names TEXT[] := ARRAY['Fleck', 'Bundger', 'Bradley', 'Opshin', 'Claskin', 'Earnslaw', 'Flabbergasted', 'Plash', 'Woohoo', 'Yayaee'];
    i INTEGER;
BEGIN
    FOR i IN 1..10000 LOOP
        INSERT INTO customers (first_name, last_name, email)
        VALUES (
            first_names[FLOOR(RANDOM() * 10 + 1)], -- Random first name
            last_names[FLOOR(RANDOM() * 10 + 1)],  -- Random last name
            CONCAT(first_names[FLOOR(RANDOM() * 10 + 1)], '.', last_names[FLOOR(RANDOM() * 10 + 1)], '@example.com') -- Random email
        );
    END LOOP;
END;
$$;

Query Plan “Before”

Before we create the index, let’s check to see what PostgreSQL does when we run the query:

EXPLAIN ANALYZE
SELECT * 
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';

Output:

                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
Seq Scan on customers (cost=0.00..269.00 rows=50 width=42) (actual time=0.019..2.565 rows=109 loops=1)
Filter: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Rows Removed by Filter: 9891
Planning Time: 0.361 ms
Execution Time: 2.598 ms

We can see that it did a sequential scan on the customers table, and this took 2.598 milliseconds to complete. It had to remove 9891 rows (i.e. rows that don’t match the criteria), and we were left with 109 matching rows.

Create the Expression-Based Index

Now let’s create the expression-based index:

CREATE INDEX idx_customers_full_name 
ON customers ((first_name || ' ' || last_name));

Our expression is the same expression that users use when they query the table.

Query Plan “After”

Now we can check if PostgreSQL uses the index by running the EXPLAIN ANALYZE command:

EXPLAIN ANALYZE
SELECT * 
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';

Here’s what I get:

                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on customers (cost=4.67..87.27 rows=50 width=42) (actual time=0.075..0.200 rows=109 loops=1)
Recheck Cond: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Heap Blocks: exact=63
-> Bitmap Index Scan on idx_customers_full_name (cost=0.00..4.66 rows=50 width=0) (actual time=0.060..0.061 rows=109 loops=1)
Index Cond: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Planning Time: 0.611 ms
Execution Time: 0.232 ms

So it used the index for the condition and then looked up the table for the remaining data requested. We can see that it took much less time to complete (0.232 vs 2.598 milliseconds without the index).

How Indexes on Expressions Differ from Partial Indexes

You may be thinking that these indexes on expressions are just another form of partial index. While it’s true that they’re similar, there is a subtle difference:

  • Expression Indexes:
    • Store the result of a function or expression applied to column data.
    • Speed up queries that repeatedly compute the same expression.
  • Partial Indexes:
    • Only index rows that meet a specific condition (i.e., a subset of the data).
    • Speed up queries that only access specific subsets of the table’s data.

For example, a partial index on the sales table could index only the rows where the amount is greater than 100:

CREATE INDEX idx_sales_high_value 
ON sales (amount)
WHERE amount > 100;

This index would improve performance for queries that filter on amount > 100, but it would not help for rows where the amount is less than or equal to 100.

Comparison with Partial Indexes

The following table summarizes the differences between expression-based and partial indexes:

FeatureExpression IndexPartial Index
FocusIndexes the result of an expression/functionIndexes a subset of the rows (filtered by a condition)
Primary UseSpeeding up queries using expressionsSpeeding up queries on a subset of the data
StorageRequires extra storage for computed valuesRequires extra storage, but only for a subset of rows
MaintenanceIndex must be updated when base column changesIndex must be updated, but only for filtered rows
Typical Use CasesOptimizing queries with functions or calculationsOptimizing queries with selective filters

Conclusion

Indexes on expressions in PostgreSQL are a handy option for optimizing queries that repeatedly perform the same calculations or transformations. They differ from partial indexes in that they focus on expressions rather than filtering data, offering a broader optimization for function-based queries.

However, they come with the cost of increased storage and maintenance overhead. Understanding the nature of your queries will help determine when and where to use expression-based indexes most effectively.