SQL Server has a system stored procedure called sp_estimate_data_compression_savings
, which allows you to check an object’s size and its estimated size with various levels of compression.
If the object is already compressed, you can use this procedure to estimate its size when recompressed.
Objects can be compressed by using row, page, columnstore or columnstore archive compression.
Compression can be evaluated for whole tables or parts of tables. This includes heaps, clustered indexes, nonclustered indexes, columnstore indexes, indexed views, and table and index partitions.
Example
Here’s an example to demonstrate.
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemHoldings',
@index_id = NULL,
@partition_number = NULL,
@data_compression = 'ROW';
Result:
+-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+ | object_name | schema_name | index_id | partition_number | size_with_current_compression_setting(KB) | size_with_requested_compression_setting(KB) | sample_size_with_current_compression_setting(KB) | sample_size_with_requested_compression_setting(KB) | |-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------| | StockItemHoldings | Warehouse | 1 | 1 | 32 | 8 | 40 | 16 | +-------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
To save you from having to do too much sideways scrolling, here it is again using vertical output:
-[ RECORD 1 ]------------------------- object_name | StockItemHoldings schema_name | Warehouse index_id | 1 partition_number | 1 size_with_current_compression_setting(KB) | 32 size_with_requested_compression_setting(KB) | 8 sample_size_with_current_compression_setting(KB) | 40 sample_size_with_requested_compression_setting(KB) | 16
The compression sizes are in kilobytes (KB).
In this case, there appears to be a significant benefit in using row compression on this table. It goes from 32 KB to 8 KB. This assumes that it’s an accurate estimation.
When I ran the previous code, I provided all argument names. You can also omit these names, and just provide the values.
Like this:
EXEC sp_estimate_data_compression_savings
'Warehouse',
'StockItemHoldings',
NULL,
NULL,
'ROW';
Either way, the result is the same.
Here it is again, but this time I specify PAGE
instead of ROW
as the compression type.
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemHoldings',
@index_id = NULL,
@partition_number = NULL,
@data_compression = 'PAGE';
Result (using vertical output):
-[ RECORD 1 ]------------------------- object_name | StockItemHoldings schema_name | Warehouse index_id | 1 partition_number | 1 size_with_current_compression_setting(KB) | 32 size_with_requested_compression_setting(KB) | 8 sample_size_with_current_compression_setting(KB) | 40 sample_size_with_requested_compression_setting(KB) | 16
In this case, the numbers look the same, but you could get vastly different numbers, depending on your data.
Compression Types
The @data_compression
argument accepts the following values:
NONE
ROW
PAGE
COLUMNSTORE
COLUMNSTORE_ARCHIVE
These are the compression options available when creating/altering a table or index.
The COLUMNSTORE
and COLUMNSTORE_ARCHIVE
options are only available on columnstore indexes (including both nonclustered columnstore and clustered columnstore indexes).
The @index_id
Argument
Sometimes your results might return multiple rows for a given object, each with a different index_id.
You can narrow it down to a specific index if you prefer. To do this, provide the index_id to the @index_id
argument.
For example, when I run the following code, eight rows are returned, each with different index_id values.
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemTransactions',
@index_id = NULL,
@partition_number = NULL,
@data_compression = 'ROW';
Result:
+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+ | object_name | schema_name | index_id | partition_number | size_with_current_compression_setting(KB) | size_with_requested_compression_setting(KB) | sample_size_with_current_compression_setting(KB) | sample_size_with_requested_compression_setting(KB) | |-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------| | StockItemTransactions | Warehouse | 2 | 1 | 5568 | 4120 | 4280 | 3168 | | StockItemTransactions | Warehouse | 3 | 1 | 5184 | 3720 | 4264 | 3064 | | StockItemTransactions | Warehouse | 4 | 1 | 5568 | 4224 | 4288 | 3256 | | StockItemTransactions | Warehouse | 5 | 1 | 5528 | 4416 | 4280 | 3424 | | StockItemTransactions | Warehouse | 6 | 1 | 5192 | 3456 | 4264 | 2840 | | StockItemTransactions | Warehouse | 7 | 1 | 5192 | 3464 | 4264 | 2848 | | StockItemTransactions | Warehouse | 9 | 1 | 5416 | 4456 | 4264 | 3512 | | StockItemTransactions | Warehouse | 1 | 1 | 2720 | 9096 | 2720 | 9096 | +-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
If we wanted to narrow it down to just one row, we could use its index_id.
Like this:
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemTransactions',
@index_id =1,
@partition_number = NULL,
@data_compression = 'ROW';
Result:
+-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+ | object_name | schema_name | index_id | partition_number | size_with_current_compression_setting(KB) | size_with_requested_compression_setting(KB) | sample_size_with_current_compression_setting(KB) | sample_size_with_requested_compression_setting(KB) | |-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------| | StockItemTransactions | Warehouse | 1 | 1 | 2720 | 9096 | 2720 | 9096 | +-----------------------+---------------+------------+--------------------+---------------------------------------------+-----------------------------------------------+----------------------------------------------------+------------------------------------------------------+
You can also use @partition_number to do the same thing with partitions.
The Amount of Compression Can Vary Significantly
The amount of compression you get will depend on the data and the type of compression.
ROW
compression, for example, removes unneeded bytes from the column values by storing them in variable length format. PAGE
compression, on the other hand, stores the repeating values only once per page, and sets the pointer from the respective columns within the page.
Sometimes you might find that compressing an object doesn’t always decrease its size, and in some cases might actually increase its size.
This could happen if your columns use a data type that doesn’t benefit from compression.
Also, row compression reduces metadata overhead, but in some cases the overhead may be greater than the old storage format.
If your data receives no benefit from compression due to its data type, then it’s likely that the overhead will cause an increase in storage requirements, rather than a decrease.
But variations in compression size will also depend on the actual data. For example, if you have a char(10) column, compression will remove any trailing padding characters. If you have lots of rows with trailing padding characters, you should get a better result than if you have no (or few) rows with trailing padding characters.
How Does it Estimate the Compression?
When you execute sp_estimate_data_compression_savings
, it takes a sample of the data, and then loads it into an equivalent table and index created in tempdb. The table or index create in tempdb is then compressed to the requested setting and the estimated compression savings is computed.
How Accurate is it?
You may get mixed results when using sp_estimate_data_compression_savings
.
Let’s run a little test.
SELECT * INTO Warehouse.StockItemTransactions2
FROM Warehouse.StockItemTransactions;
EXEC sp_spaceused 'Warehouse.StockItemTransactions2';
Result (using vertical output):
name | StockItemTransactions2 rows | 236667 reserved | 15944 KB data | 15800 KB index_size | 8 KB unused | 136 KB
The sp_spaceused
stored procedure shows us the actual disk space used. In this case, data uses 15,800 KB of disk space.
Now, I’ll execute sp_estimate_data_compression_savings
to see what space savings I’ll get if I apply compression to that table.
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemTransactions2',
@index_id = NULL,
@partition_number = NULL,
@data_compression = 'ROW';
Result (using vertical output):
object_name | StockItemTransactions2 schema_name | Warehouse index_id | 0 partition_number | 1 size_with_current_compression_setting(KB) | 15808 size_with_requested_compression_setting(KB) | 9096 sample_size_with_current_compression_setting(KB) | 15800 sample_size_with_requested_compression_setting(KB) | 9096
According to these results, applying row compression to this table will reduce its size from 15,808 KB to an estimated size of just 9,096 KB. Not too bad.
Let’s now apply row compression to this table, and then run sp_spaceused
again.
ALTER TABLE Warehouse.StockItemTransactions2
REBUILD WITH (DATA_COMPRESSION = ROW);
EXEC sp_spaceused 'Warehouse.StockItemTransactions2';
Result (using vertical output):
name | StockItemTransactions2 rows | 236667 reserved | 9160 KB data | 9088 KB index_size | 8 KB
So the actual result is very close to the estimated result.
In this case, sp_estimate_data_compression_savings
provided a pretty accurate estimate of the end result.
Let’s run sp_estimate_data_compression_savings
once more time, but using a compression type of NONE
.
EXEC sp_estimate_data_compression_savings
@schema_name = 'Warehouse',
@object_name = 'StockItemTransactions2',
@index_id = NULL,
@partition_number = NULL,
@data_compression = 'NONE';
Result:
object_name | StockItemTransactions2 schema_name | Warehouse index_id | 0 partition_number | 1 size_with_current_compression_setting(KB) | 9096 size_with_requested_compression_setting(KB) | 15808 sample_size_with_current_compression_setting(KB) | 9096 sample_size_with_requested_compression_setting(KB) | 15808
This tells us what would happen if we reverted back to using no compression.
In this case it’s showing us exactly the same number (15,808 KB) that it showed us prior to applying compression which, as you’ll recall, was pretty close to the actual size (15,800 KB) returned by the sp_spaceused
procedure.
So let’s run it again and find out.
ALTER TABLE Warehouse.StockItemTransactions2
REBUILD WITH (DATA_COMPRESSION = NONE);
EXEC sp_spaceused 'Warehouse.StockItemTransactions2';
Result (using vertical output):
name | StockItemTransactions2 rows | 236667 reserved | 15880 KB data | 15800 KB index_size | 8 KB unused | 72 KB
So again, sp_estimate_data_compression_savings
was almost spot on.
However, this is just one simple test. Other tests could return estimates that are way off. I’ve read stories of sp_estimate_data_compression_savings
returning wildly inaccurate results, but I’ve yet to experience that myself.
Therefore, it would seem that sp_estimate_data_compression_savings
can provide an accurate estimation in same cases, and not so much in others.
You’ll need to decide how much reliance you want to put into this stored procedure. In any case, you should probably run a test in your development or test environment before applying compression in production.