Understanding the “Key Lookup” in SQL Server

When looking through the execution plan in SQL Server, you may occasionally see “key lookup” glaring back at you.

If you find this happening a lot, and on frequently run queries, then you might want to do something about it.

What is a Key Lookup?

In SQL Server, a key lookup (also known as a bookmark lookup) occurs when a query uses a nonclustered index to find a row but needs to retrieve additional columns not included in the index. To get these additional columns, SQL Server performs a lookup against the clustered index to retrieve the complete row.

If the table is a heap (i.e., it has no clustered index), then it will perform a RID lookup instead of a key lookup.

Both of these operations can be costly in terms of performance, especially if they occur frequently.

Why Key Lookups Are Not So Good

Key lookups can significantly impact performance because:

  • Additional I/O Operations: Each key lookup results in additional I/O operations, as the database engine needs to access the clustered index for each row that matches the initial query criteria.
  • Increased Query Execution Time: The extra work involved in retrieving data from the clustered index can increase the overall execution time, especially if the lookup happens for a large number of rows.
  • Resource Consumption: More CPU and memory resources are consumed due to the need to perform extra lookups.

Generally it’s a good idea to avoid having key lookups on frequently run queries, and queries that involve large tables. When it comes to performance, the less I/O and resource consumption, the better.

Example of a Key Lookup

Consider a table Employees with the following columns:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    FirstName NVARCHAR(50),
    LastName NVARCHAR(50),
    DepartmentID INT,
    Salary DECIMAL(18, 2)
);

Suppose we create an index on the LastName column like this:

CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName);

And now we want to run the following query:

SELECT FirstName, LastName, Salary
FROM Employees
WHERE LastName = 'Apinya';

We can check the query plan when running that query, or we can check the “estimated query plan”.

Here’s what I get when I run this query through the estimated query plan when using SQL Server:

It had to do a key lookup to get the data that wasn’t in the index.

The only data in the index was the LastName column, but the query needed more than just that column. It also needed the FirstName and the Salary columns, hence why the database engine had to do a key lookup.

The key lookup involved looking up the clustered index for the remaining data, using the relevant key.

Had this table been a heap, it would’ve had to use a RID lookup to look up the heap itself (seeing as there would be no clustered index in this case).

Dealing with Key Lookups

Now that we know the query uses a key lookup, we can think about how to deal with it. If we believe this query is causing performance issues in our database, we should think about making it a covering index. By “covering index” I mean one that covers all columns required by the query.

As it currently stands, the index doesn’t cover all columns that the query needs, and that’s why it needs to do a key lookup. If we were to include the other columns in the index, then that should eliminate the need for SQL Server to look up the remaining data in the clustered index – the required data will already be in the nonclustered index.

So, with that said, let’s get straight into it.

Here’s one way to do it:

CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName)
INCLUDE (FirstName, Salary) 
WITH DROP_EXISTING;

Here, we used the INCLUDE option to add the other two columns to the index as non-key columns.

Another way to do it is to add the columns as key columns:

CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName,FirstName,Salary)
WITH DROP_EXISTING;

Mind you, in this case we might be better off creating a new index with a name that reflects the extra columns.

In any case, after adding the columns to the index, here’s what the query plan shows when we run the query again:

This time it used an index seek for the whole thing. There was no need to do a key lookup to get extra data from the clustered index, because all required data is in the nonclustered index.

Key Lookups vs. RID Lookups

As I touched on earlier, there’s a difference between key lookups and RID lookups:

  • Key Lookups are specific to tables with clustered indexes. The lookup uses the clustered index key to find the full row in the clustered index.
  • RID Lookups (Row Identifier Lookups) occur when a table doesn’t have a clustered index (i.e., it’s a heap). The lookup uses the RID, a unique identifier for the row in the heap, to retrieve the full row.

Both lookups are similar in that they represent additional steps in query processing to fetch full rows, but they differ in how they retrieve the rows:

  • Key Lookup uses the clustered index key.
  • RID Lookup uses the row identifier in a heap.

In the previous example where we saw the key lookup in the query plan, if the table was a heap, then we’d see RID lookup instead.

In any case, understanding and minimizing such lookups can lead to significant performance improvements in our SQL queries.

Summary

Key lookups happen when a query uses a nonclustered index that doesn’t contain all columns required by the query. In this case, SQL Server has to look up the clustered index to get the extra data. This is less efficient than getting all data from the nonclustered index, and so key lookups should generally be avoided.

Adding the missing columns to our index can resolve this issue, and eliminate the requirement for key lookups on that query. Doing this to frequently run queries, or ones that involve large tables can significantly improve the performance of these queries (and our database operations in general).