What is a Surrogate Key in SQL?

In SQL and relational database design in general, an important concept is the identification of unique rows in a table. A surrogate key is one method for achieving this.

Understanding what surrogate keys are, and how they differ from natural keys, is crucial for database administrators, developers, and anyone involved in data modeling or management.

In this article we’ll explore what the surrogate key is, how it differs from the natural key, and examples of implementing it in SQL.

What is a Key in SQL?

In order to understand what a surrogate key is, we should first understand what we mean by “key”.

In relational database design, a key is a field or combination of fields used to uniquely identify a record within a table. Some of the most commonly discussed keys include:

  • Primary Key: A column that has been defined as the unique identifier for each row in a table.
  • Foreign Key: A column that is linked to another table‘s primary key column in a relationship between two tables.
  • Composite Key: A key that consists of more than one column.

Among these, the primary key is fundamental, as it ensures that each record in a table is unique and can be referenced easily.

What is a Surrogate Key?

A surrogate key is an artificially generated value typically used as the primary key in a table. Unlike a natural key, which is derived from the actual data, a surrogate key is typically an integer or a UUID (Universally Unique Identifier) that has no business meaning. It serves solely to uniquely identify each row in the table.

Here are the main characteristics of a surrogate key:

  • Uniqueness: Each surrogate key is unique within the table.
  • Not Null: If it is to be used as the primary key, then it can’t be nullable.
  • Non-Meaningful: The key has no inherent meaning outside of the database.
  • Stable: Once assigned, the surrogate key does not change, even if the data in the row changes.

Why Use a Surrogate Key?

Surrogate keys are widely used in database design because they offer several advantages:

  • Simplicity: Surrogate keys are simpler to use, especially when dealing with complex relationships or composite keys.
  • Performance: They often improve performance, as integer-based keys are more efficient for indexing and querying.
  • Consistency: Surrogate keys remain consistent even if the underlying data changes. This is important in maintaining referential integrity.
  • Decoupling: By using a surrogate key, the database design is decoupled from the business logic. This allows for flexibility in making changes to the business logic without affecting the database schema.

Surrogate Key vs. Natural Key

A natural key is a key that is derived from the actual data in a table. For example, in a table of employees, a natural key might be the employee’s Social Security Number (SSN) or email address.

Comparison

  • Meaning: Natural keys have real-world meaning, while surrogate keys do not.
  • Stability: Natural keys can change if the underlying data changes (e.g., someone changes their email address), whereas surrogate keys do not.
  • Simplicity: Surrogate keys are often simpler to manage, especially when dealing with large databases or when the natural key involves multiple columns.

When to Use Each

  • Natural Key: Use when the data has a natural unique identifier that is unlikely to change.
  • Surrogate Key: Use when the data does not have a reliable natural key, or when simplicity, performance, and stability are priorities.

Implementing Surrogate Keys in SQL

There are several ways to implement surrogate keys in SQL. The most common methods are auto-incremented integers and UUIDs.

Below are examples of each of these.

Auto-Incremented Surrogate Keys

In many SQL databases, we can define a surrogate key that automatically increments with each new row. Here’s an example using MySQL:

CREATE TABLE Employees (
    EmployeeID INT AUTO_INCREMENT,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100),
    PRIMARY KEY (EmployeeID)
);

In this example, EmployeeID is a surrogate key that will automatically increment as new rows are added to the Employees table. We were able to make it an auto-incrementing column by using the AUTO_INCREMENT attribute.

Here’s an example for SQL Server:

CREATE TABLE Employees (
    EmployeeID INT IDENTITY(1,1),
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100),
    PRIMARY KEY (EmployeeID)
);

Here, we used the IDENTITY() property to create an identity column, which is SQL Server’s name for an auto-increment column. We specified that the surrogate key starts at 1 and increments by 1 with each increment.

UUID as Surrogate Keys

Another approach is to use UUIDs, which are 128-bit values that can be used as unique identifiers across different systems. Here’s how you might implement a UUID as a surrogate key in PostgreSQL:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE Products (
    ProductID UUID DEFAULT gen_random_uuid(),
    ProductName VARCHAR(100),
    Price DECIMAL(10, 2),
    PRIMARY KEY (ProductID)
);

This generates a random UUID for each row.

Another option is to use uuid_generate_v4() provided by uuid-ossp extension:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE Products (
    ProductID UUID DEFAULT uuid_generate_v4(),
    ProductName VARCHAR(100),
    Price DECIMAL(10, 2),
    PRIMARY KEY (ProductID)
);

In this case, ProductID is a UUID generated using PostgreSQL’s uuid-ossp extension.

This is not to imply that PostgreSQL doesn’t have an auto-increment option. PostgreSQL also has the SERIAL type, which is similar to the auto-increment option listed above for MySQL.

Example Use Cases for Surrogate Keys

Let’s consider a couple of examples to illustrate how surrogate keys might be chosen over the natural key type.

Example 1: Customer Orders

Imagine a CustomerOrders table that tracks orders placed by customers. Instead of using a composite key of CustomerID and OrderDate, we could use a surrogate key:

CREATE TABLE CustomerOrders (
    OrderID INT AUTO_INCREMENT,
    CustomerID INT,
    OrderDate DATE,
    Amount DECIMAL(10, 2),
    PRIMARY KEY (OrderID)
);

Here, OrderID is a surrogate key that uniquely identifies each order, simplifying the table structure and queries.

Example 2: Student Enrollment

Consider a StudentEnrollment table where each student can enroll in multiple courses. Instead of using a composite key of StudentID and CourseID, a surrogate key can simplify the schema:

CREATE TABLE StudentEnrollment (
    EnrollmentID INT AUTO_INCREMENT,
    StudentID INT,
    CourseID INT,
    EnrollmentDate DATE,
    PRIMARY KEY (EnrollmentID)
);

EnrollmentID is a surrogate key that uniquely identifies each enrollment record.

Best Practices for Using Surrogate Keys

When using surrogate keys, keep the following best practices in mind:

  • Use Surrogate Keys When Necessary: Not every table needs a surrogate key. Use them when natural keys are impractical or complex.
  • Index Surrogate Keys: Ensure that surrogate keys are properly indexed to maintain query performance.
  • Avoid Meaningful Data: Keep surrogate keys free of any meaningful data to ensure they remain stable and consistent.
  • Use Surrogate Keys in Large Databases: Surrogate keys can be especially beneficial in large databases where performance and simplicity are critical.
  • Combine with Natural Keys: In some cases, you may want to use a surrogate key as the primary key but still enforce uniqueness on natural keys using UNIQUE constraints.

Conclusion

Surrogate keys can provide a simple, efficient, and stable way to uniquely identify rows in a table. While they have their advantages, it’s important to use them wisely and understand when they are appropriate.