Understanding the RID Lookup in SQL Server: How It Affects Performance

If you’ve spent any time working with heaps in SQL Server, you may have seen the RID lookup in the query plan. And if you haven’t, chances are it’s because you either didn’t look at the query plan, or the queries used full table scans.

In any case, it pays to understand what a RID lookup is and how it affects query performance. If we ever see RID lookup in the query plan, it could be a good opportunity to see if we can make changes that will improve performance.

What is a RID?

When a table is stored as a heap (i.e., a table that does not have a clustered index), individual rows are identified by their RID.

RID stands for Row Identifier. The RID is 8 bytes, and it consists of of the file number, data page number, and slot on the page (FileID:PageID:SlotID). It provides an efficient way for SQL Server to identify each row in a heap.

When we create a nonclustered index on a heap, SQL Server includes the RID in the index. This can be used if a query needs to do a RID lookup.

If we create a nonclustered index on a non-heap, then SQL Server uses the index key from the clustered index instead of a RID.

What is a RID Lookup?

A RID lookup is a process that occurs when the database engine needs to fetch the actual data row from the table after identifying it using a nonclustered index. RID lookups are specific to heaps.

When a query is executed that involves a nonclustered index, SQL Server uses that nonclustered index to quickly identify the row or rows that match the query criteria. However, if the index does not have all of the columns the query needs to produce its results, then the database engine needs to do either a key lookup or a RID lookup to retrieve the remaining data. Which one it chooses depends on whether there’s a clustered index:

If the table has a clustered index, then it does a key lookup (because the reference to the data is the clustered index’s key instead of the RID).
If the table does not have a clustered index (i.e., it’s a heap), then it does a RID lookup (because the reference to the data is the RID).

So, the RID lookup is basically the heap’s equivalent of a key lookup.

You’ll probably encounter more key lookups than RID lookups, given most tables have primary keys (which by default, have clustered indexes). That said, if you use heaps in your database, you may very well encounter the RID lookup in many queries that are executed against those heaps.

Implications of the RID Lookup

While RID lookups may be necessary for retrieving data in certain scenarios, it’s usually a good idea to avoid them. They often result in additional I/O operations, which can lead to performance degradation, especially in large tables or complex queries. The more RID lookups your query requires, the more time it will take to execute, potentially slowing down your application.

How Does a RID Lookup Work?

Let’s break down the RID lookup process step by step:

Query Execution: When you run a query that involves a nonclustered index, SQL Server first uses the index to identify the potential rows that match the query criteria.
Index Search: SQL Server scans the nonclustered index, which is usually much smaller and faster to search than the full table.
Retrieve RID: If the nonclustered index does not cover all the columns needed, for each matching row, SQL Server retrieves the RID from the nonclustered index.
Lookup Actual Data: Using the RID, SQL Server goes back to the heap table to fetch the actual data row.
Return Result: The data is then returned as the result of the query.

This process is efficient for a small number of rows, but if the query matches a large number of rows, the number of RID lookups can quickly add up, leading to significant performance overhead.

Example of a RID Lookup in SQL

Let’s go through an example to illustrate how a RID lookup works.

Step 1: Create Heap

First we’ll create a heap (a table without a clustered index). We’ll also populate it with a bunch of data.

-- Create the heap
CREATE TABLE Employees (
    EmployeeID INT,
    FirstName NVARCHAR(50),
    LastName NVARCHAR(50),
    DepartmentID INT,
    Salary INT
);

-- Insert 1,000 rows of sample data
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
    INSERT INTO Employees (EmployeeID, FirstName, LastName, DepartmentID, Salary)
    VALUES (@i, 'FirstName' + CAST(@i AS NVARCHAR(50)), 'LastName' + CAST(@i AS NVARCHAR(50)), @i % 10 + 1, 50000 + (@i % 10) * 1000);
    SET @i = @i + 1;
END;

Step 2: Create a Nonclustered Index

Now let’s create a nonclustered index on our heap:

CREATE NONCLUSTERED INDEX IX_LastName ON Employees(LastName);

Recall that a heap is a table that doesn’t have a clustered index. So this nonclustered index will be the only index on the table.

Step 3: Execute a Query that Causes a RID Lookup

Here’s an example of a query that will cause a RID lookup:

SELECT FirstName, LastName, Salary
FROM Employees
WHERE LastName = 'LastName70';

Explanation:

In the example above, SQL Server uses the nonclustered index IX_LastName to quickly find rows where the LastName is LastName70.
However, the index does not contain the FirstName and Salary columns (which we’ve included in our SELECT list). Therefore, SQL Server uses the RID to perform lookups in the Employees table (which is a heap) to retrieve those columns.

Step 4: View the Execution Plan

To see the RID lookup in action, we can use SHOWPLAN_TEXT to view the execution plan for the query:

SET SHOWPLAN_TEXT ON;
GO
SELECT FirstName, LastName, Salary
FROM Employees
WHERE LastName = 'LastName70';
GO
SET SHOWPLAN_TEXT OFF;

Output:

StmtText                                                                                                                                       
-----------------------------------------------------------------------------------------------------------------------------------------------
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))                                                                                    
       |--Index Seek(OBJECT:([test].[dbo].[Employees].[IX_LastName]), SEEK:([test].[dbo].[Employees].[LastName]=N'LastName70') ORDERED FORWARD)
       |--RID Lookup(OBJECT:([test].[dbo].[Employees]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

In the execution plan, we can see the RID lookup operation, which indicates that SQL Server had to fetch data directly from the heap after using the index.

We can alternatively use SHOWPLAN_ALL to see more information.

If you’re using a GUI, you can check this information by selecting something that reads Display Estimated Execution Plan or Estimated Query Plan to see how SQL Server estimates what the plan might look like when you run the query. You could alternatively use Include Actual Execution Plan or Enable Actual Plan and then run the query to run the query and see its actual plan.

Here’s what I get in Azure Data Studio when I do this:

We can see that the query used the index to look up the LastName, but needed to use a RID lookup to get the rest of the data.

How to Avoid RID Lookups

While RID lookups are sometimes unavoidable, there are strategies to reduce or eliminate them to improve query performance:

Use a Clustered Index: Creating a clustered index on a table converts it from a heap to a B-tree structure, which allows SQL Server to retrieve the necessary data without needing RID lookups.
Include Columns in Nonclustered Indexes: If you frequently query certain columns, consider creating a nonclustered index that includes those columns to avoid the need for additional lookups.
Avoid Unnecessary Indexes: Having too many indexes can lead to more RID lookups (and you might not even be aware of them). Ensure that your indexes are optimized for the queries you run most often.
Optimize Query Design: Rewrite queries to minimize the need for lookups by carefully selecting the columns in your SELECT statement.

Conclusion

The RID lookup is a concept that applies to tables without clustered indexes (heaps). They’re used when the heap has a clustered index that doesn’t cover all the columns needed for the query.

It’s generally best to avoid RID lookups wherever possible, either by implementing appropriate nonclustered indexes for the queries that are run, or by modifying the queries. Another option is to consider whether or not the table should really be a heap. Sometimes adding a clustered index could be a welcome addition, depending on the scenario.

Database.Guide