In SQL, adding an index to a table can provide us with a significant performance boost, especially if it’s a covering index for commonly run queries.
But what exactly is a covering index?
Let’s find out.
What is a Covering Index?
A covering index is an index that includes all the columns a query needs, whether it’s in the WHERE
clause, SELECT
list, JOIN
clause, or ORDER BY
clause.
When a query is covered by an index, the database engine can retrieve all the required data directly from the index without having to access the table itself. This is highly efficient because reading from an index, which is generally smaller and more optimized than a full table scan, reduces I/O operations, making query execution faster.
So, an index is not said to be covering in and of itself. It can only be considered a covering index when we factor in the query that it supposedly covers. Therefore, the same index could be a covering index for some queries, but not a covering index for other queries.
And with that in mind, if a query is covered by an index, then we can say it’s a covered query.
Why Covering Indexes Are Good
Covering indexes are beneficial because they:
- Improve Query Performance: Since all the necessary data is in the index, the database engine doesn’t need to perform a costly lookup to the actual table (also known as a “bookmark lookup”, “key lookup”, or “RID lookup” if the table is a heap).
- Reduce I/O Operations: Fewer data pages are read because the index itself provides all the needed data, reducing disk I/O.
- Minimize Locking and Blocking: Because fewer resources are touched, there’s a reduced likelihood of locking and blocking other transactions.
What Happens When a Query Isn’t Covered?
When a query is not covered by an index, the database engine must access the table data, which can be significantly slower. The impact of this varies between RDBMS. For example, in SQL Server, it depends on whether the table has a clustered index or is a heap (a table without a clustered index).
- On Tables with a Clustered Index:
- If the index on a clustered table does not cover the query, the engine performs a key lookup. The database engine first retrieves the row locator from the non-clustered index and then uses that locator to fetch the full row from the clustered index. This operation can be slow, especially if it needs to be performed multiple times for many rows.
- On Heaps (Tables without a Clustered Index):
- In the case of a heap, if the index doesn’t cover the query, the database engine performs a RID lookup. It retrieves the Row ID (RID) from the non-clustered index and then accesses the heap to retrieve the complete row. This can be even slower than a key lookup on a clustered index, as heaps are generally less efficient to search.
Note that the above points explain how it works in SQL Server. Other RDBMSs have slightly different implementations and terminology around indexing on tables, and around un-covered queries (for example, “table access by rowid” in MySQL and Oracle). But the basic premise remains; when a query isn’t covered, the database engine needs to look up the data in someplace other than the index.
Covered Query vs Non-Covered Query Example
Here are two simple queries. The first is covered, but the second is not.
- Covered Query Example:
SELECT CustomerID, OrderDate
FROM Orders
WHERE CustomerID = 12345;
Suppose there is a non-clustered index on the Orders
table that covers both CustomerID
and OrderDate
. In this case, the query is covered.
- Non-Covered Query Example:
SELECT CustomerID, OrderDate, TotalAmount
FROM Orders
WHERE CustomerID = 12345;
If the index only covers CustomerID
and OrderDate
, and TotalAmount
is not included in the index, the query is not covered. The database will need to perform a key lookup or RID lookup to retrieve TotalAmount
.
Detecting Whether a Query is Covered
You can detect whether a query is covered by examining the execution plan. A covered query will typically show only index seek or scan operations on the covering index without any additional table access (such as a key lookup or RID lookup).
Suppose we create an index on an Employees
table like this:
CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName);
And now we want to run the following query:
SELECT FirstName, LastName, Salary
FROM Employees
WHERE LastName = 'LastName70';
We can check the query plan when running that query, or we can check the “estimated query plan”.
Here’s what I get when I run this query through the estimated query plan when using SQL Server:
We can see that it had to do a key lookup. This tells us that the query wasn’t covered.
The query wasn’t covered because the index only contained one column; the LastName
column. Sure, the query contained that column, but it also contained other columns that weren’t in the index. Therefore it wasn’t a covered query. Or put another way, the index wasn’t a covering index.
Now let’s modify the index so that it covers the query:
CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName)
INCLUDE (FirstName, Salary) WITH DROP_EXISTING;
We can’t add columns by modifying an index – we have to drop/recreate it. In this case we’ve used WITH DROP_EXISTING
, which is an option available in SQL Server that allows us to drop/recreate/rebuild the index without having to use the DROP INDEX
statement before recreating it. If using another RDBMS that doesn’t have a similar feature, you can use DROP INDEX
before running the CREATE INDEX
statement.
Here, we’ve added the FirstName
and Salary
columns to the index. We’ve added them using the INCLUDE
argument, which allows us to add non-key columns to index.
Now let’s run the query again and see how it looks in the query plan:
And just like that, it’s a covered query!
We no longer see the key lookup. All we get is the index seek, and so it didn’t need to go looking for the extra data in the clustered index. That data was already in the existing nonclustered index, and so the index seek was all it needed.
While we’re at it, here’s what the above non-covered query looks like when we run it against a heap:
This is the same table, but without the clustered index.
The result is very similar to the previous un-covered query, but this time we get a RID lookup. As mentioned, a RID lookup applies to tables that don’t have a clustered index and so it needs to go and look up the heap table itself for the remaining data.
What to Do When Queries Aren’t Covered
Here are some things we can think about doing if we find a frequently run query that isn’t covered by an index:
- Add Columns to the Existing Index: If a query is frequently executed and not covered, consider adding the missing columns to the existing index to make it a covering index. This can be done by including the columns in the index’s definition, either as key columns, or as non-key columns by using the
INCLUDE
keyword. To add columns to an index, we need to drop it and create it with the new definition (although in SQL Server, we can use theDROP_EXISTING
clause instead of dropping it first). Here’s how to do it by dropping it first:
DROP INDEX IDX_CustomerID ON Orders;
CREATE INDEX IDX_CustomerID ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);
- Create a New Covering Index: If adding columns to the existing index is not optimal (e.g., it could make the index too large), consider creating a new index that covers the specific query.
- Optimize the Query: In some cases, the query itself can be optimized to use an existing index more effectively, reducing the need for a covering index.
- Use Indexed Views: If multiple queries need similar data, consider creating an indexed view that precomputes and indexes the result set.
- Evaluate the Cost: Be mindful of the trade-offs. While covering indexes can greatly improve query performance, they also consume additional disk space and can slow down insert, update, and delete operations. Ensure that the performance gains justify these costs.
Summary
A covering index can significantly speed up queries by reducing the need to access the full table. When a query is not covered by an index, the performance difference can be substantial, especially if the table is a heap. By examining execution plans and optimizing indexes, we can ensure that critical queries are covered, improving the overall efficiency of our database operations.