What is a Candidate Key in SQL?

There are many different “key” types in SQL. We have the primary key, the foreign key, and even the unique key.

But have you heard about the candidate key?

What is a Candidate Key?

A candidate key is a minimal set of attributes (columns) that uniquely identify a tuple (row) in a relation (table). In other words, a candidate key is a combination of one or more columns that can uniquely identify every row in a table.

Each table can have multiple candidate keys, but no subset of the candidate key’s columns can uniquely identify each row. This is why candidate keys are also known as minimal superkeys.

The columns in a candidate key are called prime attributes. A column that does not occur in any candidate key is therefore referred to as a non-prime attribute.

Characteristics of a Candidate Key

A candidate key has the following characteristics:

  • Uniqueness: A candidate key must uniquely identify each row in a table.
  • Minimality: It must be a minimal superkey, meaning that if any attribute is removed, the uniqueness would be lost. If an attribute can be removed without removing the key’s ability to uniquely identify each row in the table, then it would be a superkey, but not a candidate key.

How to Identify Candidate Keys

Identifying candidate keys involves the following steps:

  1. List All Attributes: Start by listing all the attributes (columns) in the table.
  2. Determine Superkeys: A superkey is any combination of attributes that can uniquely identify a row.
  3. Check for Minimality: For each superkey, check if removing any attribute still allows the key to be unique. If not, it’s a candidate key.

Example of a Candidate Key

Let’s consider a simple table of employees:

EmployeeIDEmailSocialSecurityNumberName
001[email protected]123-45-6789Homer Einstein
002[email protected]987-65-4321Larry Bunder
003[email protected]111-22-3333Cassie Nguyen

In this table:

  • EmployeeID is a candidate key because it uniquely identifies each employee.
  • Email is also a candidate key because it is unique for each employee.
  • SocialSecurityNumber is another candidate key since no two employees can have the same social security number.

Here, the table has three candidate keys: EmployeeID, Email, and SocialSecurityNumber.

Example of a Candidate Key Across Multiple Columns

As alluded to, a candidate key can span multiple columns, as long as it’s minimal (can’t be reduced further).

Let’s consider a table that stores information about course enrollments in a university:

StudentIDCourseIDSemesterGrade
1001CS101Fall2023A
1002CS101Fall2023B+
1001MATH101Fall2023B
1002MATH101Fall2023A-

In this table:

  • The combination of StudentID, CourseID, and Semester can uniquely identify each row.
  • StudentID alone is not enough because a student can enroll in multiple courses.
  • CourseID alone is insufficient because multiple students can enroll in the same course.
  • Semester alone is also insufficient because the same course can be offered in multiple semesters.

Thus, the combination of StudentID + CourseID + Semester is a candidate key because:

  • It uniquely identifies each record (e.g., the grade received by a particular student in a specific course during a specific semester).
  • The combination is minimal; if any column is removed, the key will no longer uniquely identify each row.

Candidate Key vs. Primary Key

A primary key is a special type of candidate key. Once a candidate key is chosen to uniquely identify the rows in a table, it becomes the primary key. A table can have multiple candidate keys, but only one of them can be the primary key.

For example, in the first table above, we could choose EmployeeID as the primary key. The remaining candidate keys (Email and SocialSecurityNumber) would still be candidate keys but would not be the primary key.

Related