What is Data Integrity?

The term data integrity refers to the accuracy and consistency of data.

When creating databases, attention needs to be given to data integrity and how to maintain it. A good database will enforce data integrity whenever possible.

For example, a user could accidentally try to enter a phone number into a date field. If the system enforces data integrity, it will prevent the user from making these mistakes.

Maintaining data integrity means making sure the data remains intact and unchanged throughout its entire life cycle. This includes the capture of the data, storage, updates, transfers, backups, etc. Every time data is processed there’s a risk that it could get corrupted (whether accidentally or maliciously).

Risks to Data Integrity

Some more examples of where data integrity is at risk:

  • A user tries to enter a date outside an acceptable range.
  • A user tries to enter a phone number in the wrong format.
  • A bug in an application attempts to delete the wrong record.
  • While transferring data between two databases, the developer accidentally tries to insert the data into the wrong table.
  • While transferring data between two databases, the network went down.
  • A user tries to delete a record in a table, but another table is referencing that record as part of a relationship.
  • A user tries to update a primary key value when there’s already a foreign key in a related table pointing to that value.
  • A developer forgets that he’s on a production system and starts entering test data directly into the database.
  • A hacker manages to steal all user passwords from the database.
  • A hacker hacks into the network and drops the database (i.e. deletes it and all its data).
  • A fires sweeps through the building, burning the database computer to a cinder.
  • The regular backups of the database has been failing for the past two months…

It’s not hard to think of many more scenarios where data integrity is at risk.

Many of these risks can be addressed from within the database itself (through the use of data types and constraints against each column for example, encryption, etc), while others can be addressed through other features of the DBMS (such as regular backups – and testing that the backups do actually restore the database as expected).

Some of these require other (non-database related) factors to be present, such as an offsite backup location, a properly functioning IT network, proper training, security policies, etc.

4 Types of Data Integrity

In the database world, data integrity is often placed into the following types:

  • Entity integrity
  • Referential integrity
  • Domain integrity
  • User-defined integrity

Entity Integrity

Entity integrity defines each row to be unique within its table. No two rows can be the same.

To achieve this, a primary key can be defined. The primary key field contains a unique identifier – no two rows can contain the same unique identifier.

Referential Integrity

Referential integrity is concerned with relationships. When two or more tables have a relationship, we have to ensure that the foreign key value matches the primary key value at all times. We don’t want to have a situation where a foreign key value has no matching primary key value in the primary table. This would result in an orphaned record.

So referential integrity will prevent users from:

  • Adding records to a related table if there is no associated record in the primary table.
  • Changing values in a primary table that result in orphaned records in a related table.
  • Deleting records from a primary table if there are matching related records.

Domain Integrity

Domain integrity concerns the validity of entries for a given column. Selecting the appropriate data type for a column is the first step in maintaining domain integrity. Other steps could include, setting up appropriate constraints and rules to define the data format and/or restricting the range of possible values.

User-Defined Integrity

User-defined integrity allows the user to apply business rules to the database that aren’t covered by any of the other three data integrity types.