In PostgreSQL, indexes on expressions (also known as “functional indexes”) allow you to create an index not just on a column but on the result of an expression or function applied to a column (or multiple columns). This can significantly improve query performance when the same expression is used frequently in queries, especially for complex calculations or transformations.
How Indexes on Expressions Work
An expression index stores the results of a function or calculation rather than the raw column values. Whenever you run a query with the same expression, PostgreSQL can use the index to quickly find the precomputed results instead of recalculating them on the fly.
For example, if you often search based on a function applied to a column, such as lower(name)
, an index on that expression can make the queries much faster. Such an index might look something like this:
CREATE INDEX idx_users_name ON users (lower(name));
So the index definition consists solely of the expression.
Example of an Index on an Expression
Let’s create an example where we index on an expression that concatenates two columns. This approach will show how an expression index can optimize queries that use string concatenation.
Scenario
Suppose we have a customers
table with the following columns:
first_name
: The customer’s first namelast_name
: The customer’s last name
And suppose users often query the table for the full name (which is a concatenation of first_name
and last_name
), like this:
SELECT *
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';
We can create an expression-based index on the concatenation of these columns to optimize such queries.
Create the Table and Data
First, let’s create the table and populate it with dummy data:
-- Step 1: Create the customers table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
);
-- Step 2: Insert 10,000 rows of dummy data
DO
$$
DECLARE
first_names TEXT[] := ARRAY['Homer', 'Peter', 'Marge', 'Bart', 'Hector', 'Eve', 'Beth', 'Steve', 'Hesham', 'Trudy'];
last_names TEXT[] := ARRAY['Fleck', 'Bundger', 'Bradley', 'Opshin', 'Claskin', 'Earnslaw', 'Flabbergasted', 'Plash', 'Woohoo', 'Yayaee'];
i INTEGER;
BEGIN
FOR i IN 1..10000 LOOP
INSERT INTO customers (first_name, last_name, email)
VALUES (
first_names[FLOOR(RANDOM() * 10 + 1)], -- Random first name
last_names[FLOOR(RANDOM() * 10 + 1)], -- Random last name
CONCAT(first_names[FLOOR(RANDOM() * 10 + 1)], '.', last_names[FLOOR(RANDOM() * 10 + 1)], '@example.com') -- Random email
);
END LOOP;
END;
$$;
Query Plan “Before”
Before we create the index, let’s check to see what PostgreSQL does when we run the query:
EXPLAIN ANALYZE
SELECT *
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';
Output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on customers (cost=0.00..269.00 rows=50 width=42) (actual time=0.019..2.565 rows=109 loops=1)
Filter: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Rows Removed by Filter: 9891
Planning Time: 0.361 ms
Execution Time: 2.598 ms
We can see that it did a sequential scan on the customers table, and this took 2.598 milliseconds to complete. It had to remove 9891 rows (i.e. rows that don’t match the criteria), and we were left with 109 matching rows.
Create the Expression-Based Index
Now let’s create the expression-based index:
CREATE INDEX idx_customers_full_name
ON customers ((first_name || ' ' || last_name));
Our expression is the same expression that users use when they query the table.
Query Plan “After”
Now we can check if PostgreSQL uses the index by running the EXPLAIN ANALYZE
command:
EXPLAIN ANALYZE
SELECT *
FROM customers
WHERE (first_name || ' ' || last_name) = 'Hector Plash';
Here’s what I get:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on customers (cost=4.67..87.27 rows=50 width=42) (actual time=0.075..0.200 rows=109 loops=1)
Recheck Cond: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Heap Blocks: exact=63
-> Bitmap Index Scan on idx_customers_full_name (cost=0.00..4.66 rows=50 width=0) (actual time=0.060..0.061 rows=109 loops=1)
Index Cond: ((((first_name)::text || ' '::text) || (last_name)::text) = 'Hector Plash'::text)
Planning Time: 0.611 ms
Execution Time: 0.232 ms
So it used the index for the condition and then looked up the table for the remaining data requested. We can see that it took much less time to complete (0.232 vs 2.598 milliseconds without the index).
How Indexes on Expressions Differ from Partial Indexes
You may be thinking that these indexes on expressions are just another form of partial index. While it’s true that they’re similar, there is a subtle difference:
- Expression Indexes:
- Store the result of a function or expression applied to column data.
- Speed up queries that repeatedly compute the same expression.
- Partial Indexes:
- Only index rows that meet a specific condition (i.e., a subset of the data).
- Speed up queries that only access specific subsets of the table’s data.
For example, a partial index on the sales
table could index only the rows where the amount
is greater than 100:
CREATE INDEX idx_sales_high_value
ON sales (amount)
WHERE amount > 100;
This index would improve performance for queries that filter on amount > 100
, but it would not help for rows where the amount is less than or equal to 100.
Comparison with Partial Indexes
The following table summarizes the differences between expression-based and partial indexes:
Feature | Expression Index | Partial Index |
---|---|---|
Focus | Indexes the result of an expression/function | Indexes a subset of the rows (filtered by a condition) |
Primary Use | Speeding up queries using expressions | Speeding up queries on a subset of the data |
Storage | Requires extra storage for computed values | Requires extra storage, but only for a subset of rows |
Maintenance | Index must be updated when base column changes | Index must be updated, but only for filtered rows |
Typical Use Cases | Optimizing queries with functions or calculations | Optimizing queries with selective filters |
Conclusion
Indexes on expressions in PostgreSQL are a handy option for optimizing queries that repeatedly perform the same calculations or transformations. They differ from partial indexes in that they focus on expressions rather than filtering data, offering a broader optimization for function-based queries.
However, they come with the cost of increased storage and maintenance overhead. Understanding the nature of your queries will help determine when and where to use expression-based indexes most effectively.