What is a Heap in SQL Server?

In SQL Server, a heap is a table without a clustered index. Unlike tables with clustered indexes, which sort data in a specific order, heaps store data in no particular order. That’s because the clustered index is what determines how the table is stored and sorted (it’s sorted on the clustered index’s key column).

If there’s no clustered index, then data is initially stored in the order in which the rows are inserted, although the database engine may change this in order to store the rows more efficiently.

Key Characteristics of Heaps

  • No Clustered Index: The primary characteristic of a heap is the absence of a clustered index.
  • Unordered Data: Data pages are not sorted in any specific order.
  • Page Allocation: New pages are allocated as needed, without any predetermined order.
  • Faster Inserts: In some cases, heaps can offer faster insert operations compared to clustered index tables (due to the data being inserted without any strict order being imposed). This is particularly true for large, unordered inserts.
  • Potentially Slower Queries: Without an index, searching for specific data in a heap can be slower. This is especially true when sorting or grouping the results.

Creating a Heap

We create a heap just in the same way that we create any other table, except that we don’t specify a primary key or clustered index.

Let’s create two tables; one as a heap, and one with a primary key (i.e. not a heap):

CREATE TABLE HeapExample (
    ID INT,
    Name VARCHAR(50),
    Description VARCHAR(100)
);

CREATE TABLE NonHeapExample (
    ID INT PRIMARY KEY,
    Name VARCHAR(50),
    Description VARCHAR(100)
);

The first CREATE TABLE statement creates a heap and the second one creates a table with a primary key, which means that it’s not a heap.

By default, a clustered index is created whenever we create a primary key in SQL Server. So our second table has a clustered index, which will sort the tables by that column by default.

Inserting Data into a Heap

We insert data into a heap just like we’d insert data into any other table.

Let’s insert the same data into both tables:

INSERT INTO HeapExample (ID, Name, Description)
VALUES (2, 'Item B', 'Description for Item B'),
       (1, 'Item A', 'Description for Item A'),
       (3, 'Item C', 'Description for Item C');


INSERT INTO NonHeapExample (ID, Name, Description)
VALUES (2, 'Item B', 'Description for Item B'),
       (1, 'Item A', 'Description for Item A'),
       (3, 'Item C', 'Description for Item C');

Notice the order in which we inserted the rows. In particular, we inserted row number 2 before 1 on both tables.

Now let’s select all rows from each table:

SELECT * FROM HeapExample;
SELECT * FROM NonHeapExample;

Result:

ID  Name    Description           
-- ------ ----------------------
2 Item B Description for Item B
1 Item A Description for Item A
3 Item C Description for Item C


ID Name Description
-- ------ ----------------------
1 Item A Description for Item A
2 Item B Description for Item B
3 Item C Description for Item C

We can see that the heap is sorted in the same order that we inserted the rows.

The table with the primary key is sorted by the primary key column. That’s because it has a clustered index. Clustered indexes sort (and store) the rows based on their key values.

Although the heap is returning its rows in the same order that we inserted them, this isn’t necessarily always the case. SQL Server can move things around in order to make things more efficient, and so it’s possible that the heap would return rows in a different order.

A clustered index/primary key on the other hand will always maintain its ordering.

Querying a Heap

We can notice significant differences when querying a heap versus a regular table with a primary key.

Take the following query for example:

SELECT * 
FROM HeapExample 
WHERE ID = 2;

Here’s the query plan for that query:

It had to do a table scan to find the specified row. This probably wouldn’t be good for performance if it was a large table.

And now let’s run the same query against the non-heap (i.e. the table with the primary key/clustered index):

SELECT * 
FROM NonHeapExample 
WHERE ID = 2;

Resulting query plan:

This time the query performed a seek on the clustered index. This indicates that the query would perform much better, especially on a large table, because it won’t need to scan every single row in the table in order to find the specified row. Instead, it simply needs to look through the clustered index until it finds 2, and then it can jump to the applicable row.

Heaps with Indexes

It’s possible to create nonclustered indexes on heaps. It can be beneficial to create nonclustered indexes on heaps, especially if it’s a large heap.

But if we create a clustered index on a heap, then by definition, it’s no longer a heap.

Also, when we create (or drop) a clustered index, the whole table is rewritten. And if the table also has nonclustered indexes, all the nonclustered indexes must all be recreated whenever the clustered index is changed

Therefore, switching between heap and non-heap can take a lot of time and require a lot of disk space so that the data can be reordered in tempdb.

Potential Drawbacks of Heaps

While heaps can have their uses, they do have some potentially significant drawbacks:

  • Slower Reads: Without an index, finding specific data often requires a full table scan.
  • Fragmentation: Heaps can become fragmented over time, affecting performance.
  • Limited Data Organization: We lose the ability to physically order data based on a key.

When to Use Heaps

While clustered indexes are usually preferred in most cases, heaps can be beneficial in certain scenarios:

  • Bulk Insert Operations: When we’re doing large unordered insert operations, heaps can be faster.
  • Staging Tables: For temporary data storage before processing or moving to other tables.
  • Small Tables: For very small tables where the overhead of an index might not be justified.
  • Nonclustered Indexes: When data will always be accessed through nonclustered indexes (i.e. not through a clustered index).

When Not to Use Heaps

While heaps do have their place, in most cases they’re not the most efficient way to store our data. Here are some specific scenarios of when not to use a heap:

  • When the data is frequently returned in a sorted order.
  • When the data is frequently grouped together.
  • When ranges of data are frequently queried from the table.
  • When the table is large and there are no nonclustered indexes.
  • If the data is frequently updated.

Conclusion

Heaps in SQL Server offer a simple table structure without the overhead of maintaining a clustered index. While they can be beneficial in specific scenarios like bulk inserts or staging tables, we should carefully consider our data access patterns and performance requirements when deciding between heaps and indexed tables.