Indexes play an important role in SQL database performance. We can use them to speed up commonly run queries, so that users don’t have to sit and wait for their results to come in.
In SQL Server, indexes can be created in several different situations. For example, when we create a primary key or a UNIQUE
constraint, an index is created behind the scenes for us. However, we can also create indexes explicitly using the CREATE INDEX
statement.
Understanding Indexes in SQL Server
Before we dive into creating an index, let’s briefly look at what indexes are and why they’re important.
An index in SQL is a data structure that improves the speed of data retrieval operations on database tables. It works similarly to an index in a book, allowing the database engine to find data quickly without scanning the entire table.
When we have frequently run queries that seem to take ages to return any results, it’s quite possible that the query could benefit from an index. The query can then look up the index instead of scanning the whole table (or tables).
When we create an index, we specify the column(s) to which it applies. Therefore, a given index may or may not help a given query – it all depends on the columns used in the query vs the columns defined in the index. That’s why we need to create indexes based on specific queries.
SQL Server has quite a few index types, but two of the most commonly used indexes are clustered and nonclustered indexes.
Nonclustered Indexes
Nonclustered indexes are separate structures from the data rows themselves, containing a copy of the indexed columns and a pointer to the actual data row.
In many cases when you create an index, it’s most likely to be a nonclustered index. I say that because a table can have many nonclustered indexes but only one clustered index (which is often on the primary key column, due to SQL Server creating a clustered index on that column by default when we define the primary key).
Creating a Nonclustered Index
Here’s an example of creating a nonclustered index:
CREATE NONCLUSTERED INDEX IX_LastName
ON Employees (LastName);
This creates a nonclustered index on the LastName
column of the Employees
table. Nonclustered indexes are useful when you frequently search or sort by a column that isn’t the primary key.
Nonclustered indexes are the default index type, so we can omit the NONCLUSTERED
keyword from the above example to achieve the same outcome:
CREATE INDEX IX_LastName
ON Employees (LastName);
Indexes can also be created with the CREATE TABLE
statement when we create the table. Therefore, we can create nonclustered indexes without having to use a separate CREATE INDEX
statement.
There are a couple of ways we can do it. Here are some examples:
--Create the index in the column definition
CREATE TABLE Employees
(
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50) INDEX IX_LastName NONCLUSTERED,
Email VARCHAR(100)
);
--Create the index after all column definitions
CREATE TABLE Employees
(
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
Email VARCHAR(100),
INDEX IX_LastName NONCLUSTERED (LastName)
);
--Create a composite index
CREATE TABLE Employees
(
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
Email VARCHAR(100),
INDEX IX_FullName NONCLUSTERED (FirstName, LastName)
);
As seen in this example, we can create an index in the actual column definition itself, or we can place it at the end, after all column definitions (so that any indexes are defined separately to the columns themselves). The result is the same whichever method you choose. You may choose one over the other for code readability, for example defining multiple indexes might be easier to understand if they’re all grouped together at the end, rather than scattered throughout the table.
The last example here created a composite index. That is, an index that consists of more than one column. When we define a composite index, we include the names of each column in the index. In this case, I created an index that consisted of the FirstName
and LastName
columns.
As a side note, the above table doesn’t have a clustered index, which makes the table a heap. If we’d defined a primary key, then it would’ve received a clustered index by default, and therefore it wouldn’t be a heap. But we didn’t define a primary key or a clustered index. We’ll look at how to create a clustered index next.
Clustered Indexes
A clustered index determines the physical order of data in a table. Each table can have only one clustered index.
By default, when we create a primary key, a clustered index is created for the columns that make up the primary key. But we can override this by explicitly defining a nonclustered index for the primary key, or by having an existing clustered index on the table before we create the primary key (as mentioned, a table can only have one clustered index, and so if it’s not on the primary key, then we can create one elsewhere if need be).
Creating a Clustered Index
Here’s an example of creating a clustered index:
CREATE CLUSTERED INDEX IX_EmployeeId
ON Employees (EmployeeId);
This creates a clustered index on the EmployeeId
column of the Employees
table. The data rows will be physically stored in the order of EmployeeId
.
Another way to do it is to make that column the primary key (as mentioned, when we create a primary key, a clustered index is created by default).
We normally create the primary key at the time of creating the table. So we could do something like this:
CREATE TABLE Employees
(
EmployeeId INT PRIMARY KEY,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
Email VARCHAR(100),
);
Simply including PRIMARY KEY
in the column definition defines it as a primary key.
We can alternatively add the primary key after the column definitions:
CREATE TABLE Employees (
EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100),
CONSTRAINT PK_Employees PRIMARY KEY (EmployeeID)
);
Yet another way to do it is to add the primary key in a completely separate statement:
-- Create the Employees table without a primary key
CREATE TABLE Employees (
EmployeeID INT NOT NULL,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100)
);
-- Add the primary key after the table creation
ALTER TABLE Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (EmployeeID);
Note that we need to make the column NOT NULL
before we can add a primary key to it. If we don’t do this, we’ll get an error.
It’s also possible to add an index to a table in this fashion (i.e. using the ALTER TABLE
statement), but only for memory optimized tables.
UNIQUE
Indexes
A UNIQUE
index ensures that the index key contains no duplicate values, thus enforcing uniqueness of the columns included in the index.
Creating a UNIQUE
Index
Here’s an example:
CREATE UNIQUE NONCLUSTERED INDEX IX_Email
ON Employees (Email);
This creates a unique nonclustered index on the Email
column, ensuring no two users can have the same email address.
We can also use the CONSTRAINT
keyword to create a UNIQUE
constraint at the time of creating the table:
CREATE TABLE Employees
(
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
Email VARCHAR(100),
CONSTRAINT UQ_Email UNIQUE(Email)
);
Filtered Index
A filtered index is a nonclustered index with a WHERE
clause, allowing you to index a portion of rows in a table.
Creating a Filtered Index
Here’s how to create a filtered index:
CREATE NONCLUSTERED INDEX IX_ActiveEmployees
ON Employees (LastName, FirstName)
WHERE Status = 'Active';
This creates an index only for active employees, which can be more efficient if you frequently query for active employees, especially if the table contains a lot of inactive employees.
The FILLFACTOR
Option
FILLFACTOR
is an index option that specifies how full SQL Server should make each index page when it’s created or rebuilt.
Using FILLFACTOR
Here’s an example:
CREATE NONCLUSTERED INDEX IX_ProductName
ON Products (ProductName)
WITH (FILLFACTOR = 80);
This creates an index with 80% full pages, leaving 20% free space for future insertions. This can be helpful on tables that incur a lot of changes to the data. This is because the data can be stored in the same page as the rest of the data. If we had a fillfactor of 100, then this could lead to fragmentation of the index, due to new data having to be stored in separate pages.
The default fillfactor is 0
, which is actually equal to 100
. This means that by default, each page has no free space available for future inserts (which means that new data will need to go elsewhere – in another page).
This option should be used in consideration of how much the data changes in the table. For tables that don’t change much, then the default fillfactor (or a fillfactor of 100
) should be fine. This is the case even if you will still be adding new rows, as long as the existing rows aren’t changed a lot. That’s because any new rows can be added to subsequent pages, and they don’t affect the existing rows/pages, thereby not fragmenting the index due to data being shifted around.
DROP_EXISTING
Option
The DROP_EXISTING
option allows you to recreate an existing index without dropping and recreating it manually.
Using DROP_EXISTING
Here’s how to use DROP_EXISTING
:
CREATE NONCLUSTERED INDEX IX_FullName
ON Employees (LastName, FirstName)
WITH (DROP_EXISTING = ON);
This recreates the IX_FullName
index, potentially changing its definition. For example, perhaps this index previously had an explicit fillfactor defined. Using DROP_EXISTING
will cause it to use the server-wide default fillfactor.