Why it’s a Good Idea to Create Indexes on Foreign Keys in PostgreSQL

When a foreign key is defined in a table, it ensures that the values in the column(s) correspond to values in a primary key or unique key in another table. While PostgreSQL automatically creates an index for primary keys (because these need to be fast for lookups and enforcing uniqueness), it does not automatically create an index for foreign keys.

Why Create Indexes on Foreign Keys in PostgreSQL?

Here’s why it’s usually recommended to create indexes on foreign keys:

  • Optimizing Joins:
    Foreign keys are frequently used in JOIN operations when querying related data. Without an index on the foreign key column, PostgreSQL must perform a sequential scan on the table, leading to slower joins as the table size grows.
  • Improving Referential Integrity Checks:
    When rows in the referenced table (i.e., the table with the primary key) are updated or deleted, PostgreSQL must ensure that referential integrity is maintained by checking the foreign key constraints. Without an index, PostgreSQL must perform a full scan on the table containing the foreign key, which can be slow for large tables.
  • Performance on Updates/Deletes:
    If you perform ON DELETE or ON UPDATE operations on the primary key, PostgreSQL will need to verify that no dependent records in the foreign key table are violating the constraint. Indexes speed up these checks.

Why Aren’t Indexes Automatically Created on Foreign Keys?

It might be tempting to think that it’s ridiculous that indexes aren’t automatically created on foreign keys. But here are some reasons why you might not want indexes to be automatically created on foreign keys:

  • Flexibility:
    Automatically indexing all foreign keys might lead to unnecessary overhead if certain foreign keys are rarely used in queries, resulting in additional maintenance costs during INSERT, UPDATE, and DELETE operations.
  • Developer Choice:
    PostgreSQL allows developers to choose which foreign key relationships require indexing (and which type of index to use) based on the workload. This way, developers can create indexes where they know performance will benefit from them.
  • Indexing Overhead:
    Maintaining indexes has a cost in terms of storage and write performance. By not creating indexes automatically on foreign keys, PostgreSQL avoids introducing unnecessary overhead on tables where foreign key lookups are infrequent or irrelevant.

Here’s what the PostgreSQL documentation has to say about this:

Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, the declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Example: Creating an Explicit Index on a Foreign Key

Let’s walk through an example where we explicitly create an index for a foreign key but leave the primary key without an explicit index (relying on PostgreSQL to automatically index the primary key).

Step 1: Create Tables and Define Foreign Key

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE
);

At this point, PostgreSQL has automatically created an index on the customer_id column in the customers table because it is the primary key. However, no index has been created for customer_id in the orders table, which is a foreign key.

Step 2: Check Indexes

We can check which columns have indexes by querying the pg_indexes view. So let’s check that now (before we create an index for the foreign key column):

SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename IN ('customers', 'orders');

Output:

   indexname    |                                     indexdef                                     
----------------+----------------------------------------------------------------------------------
customers_pkey | CREATE UNIQUE INDEX customers_pkey ON public.customers USING btree (customer_id)
orders_pkey | CREATE UNIQUE INDEX orders_pkey ON public.orders USING btree (order_id)

We can see that there are only two indexes; both of which are for the primary keys of their respective tables.

No indexes have been created for the foreign key.

Step 3: Create Index on the Foreign Key

Let’s now explicitly create an index on the foreign key (customer_id) column in the orders table:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Step 4: Re-Check Indexes

Let’s re-run the pg_indexes view:

SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename IN ('customers', 'orders');

Output:

       indexname        |                                     indexdef                                     
------------------------+----------------------------------------------------------------------------------
customers_pkey | CREATE UNIQUE INDEX customers_pkey ON public.customers USING btree (customer_id)
orders_pkey | CREATE UNIQUE INDEX orders_pkey ON public.orders USING btree (order_id)
idx_orders_customer_id | CREATE INDEX idx_orders_customer_id ON public.orders USING btree (customer_id)

We can see that we now have an index on the foreign key column.

Summary

Creating an index on a foreign key in PostgreSQL is recommended for improving performance in queries that involve foreign key relationships, such as JOINs, and for maintaining referential integrity efficiently. However, foreign key indexes are not created automatically because not all foreign keys may need indexing, and the overhead of maintaining unnecessary indexes can be costly.

By explicitly creating indexes where needed, we ensure that our database performs well while still maintaining flexibility in how indexes are applied.