You may have seen the term “RID lookup” when reading about heap tables, or perhaps when viewing the query plan for a query against a heap in SQL Server.
But what exactly is a RID lookup? And what exactly is a RID?
Let’s find out.
What is a RID?
RID stands for Row Identifier in SQL Server. It is a unique identifier used by the database engine to access rows in a heap table (a table without a clustered index).
Whenever we create a heap, SQL Server automatically creates the RIDs for each row. SQL Server also creates RIDs whenever we drop a clustered index from a table (effectively converting it to a heap). This is because RIDs only apply to heaps. If a table has a clustered index, it doesn’t have a RID. Instead, it has an index key, which determines how the table is ordered.
The RID is 8 bytes, and it consists of of the file number, data page number, and slot on the page (FileID:PageID:SlotID
). It provides an efficient way for SQL Server to identify each row in a heap.
When Do RIDs Apply?
RIDs specifically apply to heap tables—tables that do not have a clustered index. In a heap, rows are stored in no particular order, and the RID is used internally by SQL Server to locate a row. Unlike tables with a clustered index, where rows are organized by the clustered index key, heap tables use the RID as the only means to uniquely identify and retrieve rows.
What are RID Lookups?
A RID lookup occurs when SQL Server uses the RID to retrieve data from a heap table after finding a matching entry in a nonclustered index. When a nonclustered index is used in a query, it might not include all the columns needed by the query. In this case, SQL Server retrieves the RID from the nonclustered index, then performs an additional lookup in the heap to get the necessary columns.
So to be clear, not all queries that use the index will result in a RID lookup. It will only be those queries where the index doesn’t cover all the columns required by the query. This is when SQL Server needs to look up the table for the extra data. If the index covers the query (i.e. if it’s a covering index), then it won’t need to get anything from the table (because it’s all in the index), and therefore it won’t need to do a RID lookup.
Difference Between RIDs and Clustered Index Keys
- RID: A RID is a pointer used internally by SQL Server to locate a row in a heap.
- Clustered Index Key: In a table with a clustered index, the clustered index key uniquely identifies rows. The rows are stored in order based on the clustered index key, making data retrieval faster for queries that use this key.
RID Lookups vs Key Lookups
- RID Lookup: Occurs when a nonclustered index is used to retrieve rows from a heap, and the query requires columns not in the index, SQL Server first uses the nonclustered index to find the RID, then performs a lookup in the heap to fetch the required data.
- Key Lookup: Occurs in tables with a clustered index. When a nonclustered index is used, and the query requires columns not in the index, SQL Server uses the clustered index key to perform a lookup in the clustered index.
Why RID Lookups Are Not Usually Desirable
RID lookups are generally not desirable for several reasons:
- Performance Overhead: Each RID lookup requires SQL Server to perform an additional I/O operation to retrieve the full row, leading to slower query performance, especially in large tables with frequent queries.
- Fragmentation: Heaps can become fragmented over time, making RID lookups even more expensive as SQL Server may need to traverse multiple pages to find the required data.
- Index Maintenance: Managing nonclustered indexes on heaps can become more complex and less efficient, as they must reference the RID rather than a more straightforward clustered index key.
This is why SQL Server’s query optimizer generally prefers to avoid RID lookups and may choose to use other strategies like index scans or seeks when possible.
However, as always, there can be exceptions to the rule. And there could be cases where a RID lookup may make sense, for example when the RID is smaller than the clustered index key.
What Happens When a Clustered Index is Created on a Heap with Nonclustered Indexes?
When we create a clustered index on a heap table that already has nonclustered indexes:
- Table Rebuild: The heap is converted into a clustered index. The data rows are ordered based on the clustered index key.
- Nonclustered Index Update: All nonclustered indexes that previously used the RID to locate rows are rebuilt. Instead of pointing to a RID, these indexes will now point to the clustered index key. This is because, in a clustered index, the key is used to locate rows, not the RID.
What Happens When a Clustered Index is Dropped?
When we drop a clustered index from a table:
- Table Rebuild: The table is converted from a clustered index back to a heap. The rows are no longer ordered by the clustered index key.
- Nonclustered Index Update: Nonclustered indexes are rebuilt to store RIDs instead of the clustered index key. This allows the nonclustered indexes to locate rows in the heap.
This rebuilding of indexes when creating or dropping a clustered index is a critical consideration, especially in large tables, as it can be resource-intensive and time-consuming.