In SQL, indexing can be a great way to improve query performance across our database. We can create indexes to support the most commonly used queries, and these indexes can help our queries run much faster.
But it’s not always a matter of just “adding an index” and leaving it at that. If we really want the best bang for our buck, we usually need to look at how best to implement the index. There are many decisions that go into designing the best index for the job. And one such decision might be, should this be a filtered index?
But what exactly is a filtered index, and how can it improve our database efficiency?
Let’s find out.
What is a Filtered Index?
In database management systems (DBMSs), a filtered index is a special type of index that only includes a subset of rows from a table (or view), based on a specified filter condition.
Unlike a standard index that covers all rows, a filtered index is more selective, focusing only on the data that meets certain criteria. This criteria would match the criteria of some of the most commonly run queries on the table, thereby helping to improve the performance of those queries.
The criteria for a filtered index can be specified with a WHERE
clause, just like when filtering query results from a SELECT
statement.
Filtered indexes are also known as “partial indexes”, and the terminology you use will probably depend on your DBMS. For example, the SQL Server documentation uses the term “filtered index” while other DBMSs such as SQLite and PostgreSQL use “partial index”.
Some DBMSs offer the “expression-based index“, “functional index”, or “function-based index”, and some provide “indexing on NULL values”, which allows partial indexing based on whether a column is NULL
or NOT NULL
. While these may provide similar functionality to filtered indexes, they’re not quite the same as the filtered index as outlined in this article.
Advantages of Filtered Indexes
Why would we want to use a filtered index? There are several compelling reasons:
- Improved query performance: By indexing only the relevant subset of data, our queries that match the filter condition can run much faster.
- Reduced index size: Since we’re only indexing a portion of the table, the index takes up less storage space.
- Better statistics: The database can maintain more accurate statistics on the indexed subset, potentially leading to better query execution plans.
- Lower maintenance overhead: Smaller indexes are quicker to update and require less maintenance.
- Allowing data exceptions when implementing a new constraint: For example, suppose we want to add a
UNIQUE
constraint to a table that already has duplicate data (i.e. violates the constraint that we’re trying to implement). This would prevent the constraint from being added. In this case we could create a filteredUNIQUE
index that filters out the violating data (i.e. pretend it’s not there), and implement the index for all new data. This may or may not be desirable, depending on your business rules, but at least it’s an option.
Examples of Filtered Indexes
Below are some examples of the CREATE INDEX
statement used to create filtered indexes.
To create a filtered index, we typically use a SQL statement that includes a WHERE
clause to define the filter condition.
For example:
CREATE INDEX idx_current_tasks ON Tasks (TaskId, StartDate)
WHERE EndDate IS NULL;
This index could be useful for the following query:
SELECT TaskName, TaskID, StartDate
FROM Tasks
WHERE EndDate IS NULL
AND TasktID = 102
AND StartDate > '01/01/2024';
Here’s another example of creating a filtered index:
CREATE INDEX idx_active_customers ON Customers (LastName, FirstName)
WHERE Status = 'Active';
This index would only include rows where the Status
column is Active
, potentially speeding up queries that frequently search for active customers.
Here’s another use-case:
CREATE INDEX idx_customers_with_email ON Customers (FirstName, Email)
WHERE Email IS NOT NULL;
This could help for queries that generate an email list of customers, as it doesn’t index those customers that don’t have an email.
Here’s an example of applying a UNIQUE
index to new records only:
CREATE UNIQUE INDEX UX_Customers_Email
ON Customers(Email)
WHERE CustomerID > 1000;
This filters out all rows that have an CustomerId
of 1000 or lower. This demonstrates a scenario where we’re trying to implement a UNIQUE
constraint on a table that already contains a thousand rows, but some of those rows contain duplicate email addresses. This is kind of a wide-sweeping solution that omits those first thousand rows from the index, and only enforces uniqueness for new customers.
If we wanted to get more specific, we could change the filter criteria to omit only the specific rows that are duplicates. Either way, this approach allows us to bypass any existing data that violates our rule, and implement it for all new rows going forward.
Limitations of Filtered Indexes
It’s important to note that filtered indexes do have some limitations. They’re not supported in all database systems, and they’re only used when our query matches or is more restrictive than the index’s filter condition. So we would need to be sure that our filtered index matches the way in which users are actually searching through the data.
Also, depending on the DMBS, there could be a whole range of technical/syntax limitations when it comes to creating a filtered index. For example, in SQL Server, we can only use simple comparison operators when creating a filtered index, we can’t use the LIKE
operator, and we can’t create a filtered index on a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column.
As always, it’s a good idea to read up on filtered indexes for your DBMS to understand the capabilities and limitations of its implementation of filtered indexes.
More About Indexes
See my Introduction to Indexing in SQL for an overview of what indexes are in SQL and why they can be useful for speeding up slow-running queries.