Estimate Data Compression Savings in SQL Server

SQL Server has a system stored procedure called sp_estimate_data_compression_savings, which allows you to check an object’s size and its estimated size with various levels of compression.

If the object is already compressed, you can use this procedure to estimate its size when recompressed.

Objects can be compressed by using row, page, columnstore or columnstore archive compression.

Compression can be evaluated for whole tables or parts of tables. This includes heaps, clustered indexes, nonclustered indexes, columnstore indexes, indexed views, and table and index partitions.

Example

Here’s an example to demonstrate.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemHoldings', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'ROW';

Result:

+-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
| object_name       | schema_name   | index_id   | partition_number   | size_with_current_compression_setting(KB)   | size_with_requested_compression_setting(KB)   | sample_size_with_current_compression_setting(KB)   | sample_size_with_requested_compression_setting(KB)   |
|-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------|
| StockItemHoldings | Warehouse     | 1          | 1                  | 32                                          | 8                                             | 40                                                 | 16                                                   |
+-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+

To save you from having to do too much sideways scrolling, here it is again using vertical output:

-[ RECORD 1 ]-------------------------
object_name                                        | StockItemHoldings
schema_name                                        | Warehouse
index_id                                           | 1
partition_number                                   | 1
size_with_current_compression_setting(KB)          | 32
size_with_requested_compression_setting(KB)        | 8
sample_size_with_current_compression_setting(KB)   | 40
sample_size_with_requested_compression_setting(KB) | 16

The compression sizes are in kilobytes (KB).

In this case, there appears to be a significant benefit in using row compression on this table. It goes from 32 KB to 8 KB. This assumes that it’s an accurate estimation.

When I ran the previous code, I provided all argument names. You can also omit these names, and just provide the values.

Like this:

EXEC sp_estimate_data_compression_savings 
    'Warehouse', 
    'StockItemHoldings', 
    NULL, 
    NULL, 
    'ROW';

Either way, the result is the same.

Here it is again, but this time I specify PAGE instead of ROW as the compression type.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemHoldings', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'PAGE';

Result (using vertical output):

-[ RECORD 1 ]-------------------------
object_name                                        | StockItemHoldings
schema_name                                        | Warehouse
index_id                                           | 1
partition_number                                   | 1
size_with_current_compression_setting(KB)          | 32
size_with_requested_compression_setting(KB)        | 8
sample_size_with_current_compression_setting(KB)   | 40
sample_size_with_requested_compression_setting(KB) | 16

In this case, the numbers look the same, but you could get vastly different numbers, depending on your data.

Compression Types

The @data_compression argument accepts the following values:

NONE
ROW
PAGE
COLUMNSTORE
COLUMNSTORE_ARCHIVE

These are the compression options available when creating/altering a table or index.

The COLUMNSTORE and COLUMNSTORE_ARCHIVE options are only available on columnstore indexes (including both nonclustered columnstore and clustered columnstore indexes).

The `@index_id` Argument

Sometimes your results might return multiple rows for a given object, each with a different index_id.

You can narrow it down to a specific index if you prefer. To do this, provide the index_id to the @index_id argument.

For example, when I run the following code, eight rows are returned, each with different index_id values.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemTransactions', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'ROW';

Result:

+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
| object_name           | schema_name   | index_id   | partition_number   | size_with_current_compression_setting(KB)   | size_with_requested_compression_setting(KB)   | sample_size_with_current_compression_setting(KB)   | sample_size_with_requested_compression_setting(KB)   |
|-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------|
| StockItemTransactions | Warehouse     | 2          | 1                  | 5568                                        | 4120                                          | 4280                                               | 3168                                                 |
| StockItemTransactions | Warehouse     | 3          | 1                  | 5184                                        | 3720                                          | 4264                                               | 3064                                                 |
| StockItemTransactions | Warehouse     | 4          | 1                  | 5568                                        | 4224                                          | 4288                                               | 3256                                                 |
| StockItemTransactions | Warehouse     | 5          | 1                  | 5528                                        | 4416                                          | 4280                                               | 3424                                                 |
| StockItemTransactions | Warehouse     | 6          | 1                  | 5192                                        | 3456                                          | 4264                                               | 2840                                                 |
| StockItemTransactions | Warehouse     | 7          | 1                  | 5192                                        | 3464                                          | 4264                                               | 2848                                                 |
| StockItemTransactions | Warehouse     | 9          | 1                  | 5416                                        | 4456                                          | 4264                                               | 3512                                                 |
| StockItemTransactions | Warehouse     | 1          | 1                  | 2720                                        | 9096                                          | 2720                                               | 9096                                                 |
+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+

If we wanted to narrow it down to just one row, we could use its index_id.

Like this:

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemTransactions', 
    @index_id =1, 
    @partition_number = NULL, 
    @data_compression = 'ROW';

Result:

+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
| object_name           | schema_name   | index_id   | partition_number   | size_with_current_compression_setting(KB)   | size_with_requested_compression_setting(KB)   | sample_size_with_current_compression_setting(KB)   | sample_size_with_requested_compression_setting(KB)   |
|-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------|
| StockItemTransactions | Warehouse     | 1          | 1                  | 2720                                        | 9096                                          | 2720                                               | 9096                                                 |
+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+

You can also use @partition_number to do the same thing with partitions.

The Amount of Compression Can Vary Significantly

The amount of compression you get will depend on the data and the type of compression.

ROW compression, for example, removes unneeded bytes from the column values by storing them in variable length format. PAGE compression, on the other hand, stores the repeating values only once per page, and sets the pointer from the respective columns within the page.

Sometimes you might find that compressing an object doesn’t always decrease its size, and in some cases might actually increase its size.

This could happen if your columns use a data type that doesn’t benefit from compression.

Also, row compression reduces metadata overhead, but in some cases the overhead may be greater than the old storage format.

If your data receives no benefit from compression due to its data type, then it’s likely that the overhead will cause an increase in storage requirements, rather than a decrease.

But variations in compression size will also depend on the actual data. For example, if you have a char(10) column, compression will remove any trailing padding characters. If you have lots of rows with trailing padding characters, you should get a better result than if you have no (or few) rows with trailing padding characters.

How Does it Estimate the Compression?

When you execute sp_estimate_data_compression_savings, it takes a sample of the data, and then loads it into an equivalent table and index created in tempdb. The table or index create in tempdb is then compressed to the requested setting and the estimated compression savings is computed.

How Accurate is it?

You may get mixed results when using sp_estimate_data_compression_savings.

Let’s run a little test.

SELECT * INTO Warehouse.StockItemTransactions2
FROM Warehouse.StockItemTransactions;

EXEC sp_spaceused 'Warehouse.StockItemTransactions2';

Result (using vertical output):

name       | StockItemTransactions2
rows       | 236667              
reserved   | 15944 KB
data       | 15800 KB
index_size | 8 KB
unused     | 136 KB

The sp_spaceused stored procedure shows us the actual disk space used. In this case, data uses 15,800 KB of disk space.

Now, I’ll execute sp_estimate_data_compression_savings to see what space savings I’ll get if I apply compression to that table.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemTransactions2', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'ROW';

Result (using vertical output):

object_name                                        | StockItemTransactions2
schema_name                                        | Warehouse
index_id                                           | 0
partition_number                                   | 1
size_with_current_compression_setting(KB)          | 15808
size_with_requested_compression_setting(KB)        | 9096
sample_size_with_current_compression_setting(KB)   | 15800
sample_size_with_requested_compression_setting(KB) | 9096

According to these results, applying row compression to this table will reduce its size from 15,808 KB to an estimated size of just 9,096 KB. Not too bad.

Let’s now apply row compression to this table, and then run sp_spaceused again.

ALTER TABLE Warehouse.StockItemTransactions2
REBUILD WITH (DATA_COMPRESSION = ROW);

EXEC sp_spaceused 'Warehouse.StockItemTransactions2';

Result (using vertical output):

name       | StockItemTransactions2
rows       | 236667              
reserved   | 9160 KB
data       | 9088 KB
index_size | 8 KB

So the actual result is very close to the estimated result.

In this case, sp_estimate_data_compression_savings provided a pretty accurate estimate of the end result.

Let’s run sp_estimate_data_compression_savings once more time, but using a compression type of NONE.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Warehouse', 
    @object_name = 'StockItemTransactions2', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'NONE';

Result:

object_name                                        | StockItemTransactions2
schema_name                                        | Warehouse
index_id                                           | 0
partition_number                                   | 1
size_with_current_compression_setting(KB)          | 9096
size_with_requested_compression_setting(KB)        | 15808
sample_size_with_current_compression_setting(KB)   | 9096
sample_size_with_requested_compression_setting(KB) | 15808

This tells us what would happen if we reverted back to using no compression.

In this case it’s showing us exactly the same number (15,808 KB) that it showed us prior to applying compression which, as you’ll recall, was pretty close to the actual size (15,800 KB) returned by the sp_spaceused procedure.

So let’s run it again and find out.

ALTER TABLE Warehouse.StockItemTransactions2
REBUILD WITH (DATA_COMPRESSION = NONE);

EXEC sp_spaceused 'Warehouse.StockItemTransactions2';

Result (using vertical output):

name       | StockItemTransactions2
rows       | 236667              
reserved   | 15880 KB
data       | 15800 KB
index_size | 8 KB
unused     | 72 KB

So again, sp_estimate_data_compression_savings was almost spot on.

However, this is just one simple test. Other tests could return estimates that are way off. I’ve read stories of sp_estimate_data_compression_savings returning wildly inaccurate results, but I’ve yet to experience that myself.

Therefore, it would seem that sp_estimate_data_compression_savings can provide an accurate estimation in same cases, and not so much in others.

You’ll need to decide how much reliance you want to put into this stored procedure. In any case, you should probably run a test in your development or test environment before applying compression in production.

Example

Compression Types

The @index_id Argument

The Amount of Compression Can Vary Significantly

How Does it Estimate the Compression?

How Accurate is it?

The `@index_id` Argument