When you create a table in SQL Server, you have the option of using data compression.
Data compression helps reduce the size of the database. It can also help improve performance of I/O intensive workloads due to the data being stored in fewer pages, therefore reducing the number of pages that queries need to read from disk.
To do this, use the DATA_COMPRESSION
option when creating the table.
Example
Here’s an example to demonstrate.
CREATE TABLE Movies (
MovieId int IDENTITY(1,1) PRIMARY KEY NOT NULL,
MovieName nvarchar(200)
)
WITH (DATA_COMPRESSION = ROW);
In this case I use row compression.
The following uses page compression.
CREATE TABLE Movies (
MovieId int IDENTITY(1,1) PRIMARY KEY NOT NULL,
MovieName nvarchar(200)
)
WITH (DATA_COMPRESSION = PAGE);
How to Remove Compression
You can remove the compression by using the ALTER TABLE
statement to rebuild the table, while using NONE
as the compression type.
ALTER TABLE MOVIES
REBUILD WITH (DATA_COMPRESSION = NONE);
Columnstore Tables
If you’re using columnstore tables (tables stored with a clustered columnstore index), the above compression types don’t apply. In this case, your compression options are COLUMNSTORE
and COLUMNSTORE_ARCHIVE
.
Compression Results Can Vary
The amount of compression you get will depend on the data and the type of compression.
ROW
compression, for example, removes unneeded bytes from the column values by storing them in variable length format. PAGE
compression, on the other hand, stores the repeating values only once per page, and sets the pointer from the respective columns within the page.
Sometimes you might find that compressing an object doesn’t always decrease its size, and in some cases might actually increase its size.
This could happen if your columns use a data type that doesn’t benefit from compression.
Also, row compression reduces metadata overhead, but in some cases the overhead may be greater than the old storage format.
If your data receives no benefit from compression due to its data type, then it’s likely that the overhead will cause an increase in storage requirements, rather than a decrease.
But variations in compression size will also depend on the actual data. For example, if you have a char(10) column, compression will remove any trailing padding characters. If you have lots of rows with trailing padding characters, you should get a better result than if you have no (or few) rows with trailing padding characters.