Structured Query Language (SQL) is the backbone of relational database management systems (RDBMSs), enabling users to query, manipulate, and define data. One of the most fundamental concepts in SQL, and one that all SQL developers should understand, is the data type.
Whenever we create a column in SQL, we must define its data type. Similarly, when we create a variable, we define its data type.
So, why is the data type so important? Let’s find out.
Understanding Data Types in SQL
A data type defines the kind of value that a column or variable can hold. It is a necessary constraint to ensure that the data stored in a database is accurate, consistent, and relevant. Data types tell the database how much space to allocate for a value and how to interpret the data. This becomes crucial when we need to perform operations on the data or maintain data integrity.
By defining the data type of a column or variable, we are able to ensure that only a certain type of data is inserted into that column or assigned to that variable. For example, by assigning a DATE
data type to a DOB (date of birth) column, we can be sure that only dates will be entered into that column. If a user attempts to enter a value that isn’t a date, then the database engine will reject it. This helps to maintain data integrity, because it eliminates the possibility that the the column will include non-date data. It narrows the values down to date values.
That’s not to say that someone could enter an invalid date, which is another possibility. In this case, we do have further tools at our disposal to eliminate this possibility. One of these is the user-defined data type, which allows us to create a data type that’s moulded to our specific requirements.
Most major RDBMSs include built-in data types (such as INTEGER
, DATE
, etc), while also supporting the creation of user-defined data types. User-defined data types allow us to create data types based on our own specific needs.
Examples of Common SQL Data Types
Here are some examples of the most commonly used built-in data types:
SQL Data Type | Description | Example Values |
---|---|---|
INT / INTEGER | Used for storing whole numbers. It is useful when the data doesn’t require a fractional component. Depending on the RDBMS, we can use INT or INTEGER – they both mean the same thing. | 1 , 100 , -45 |
FLOAT | Suitable for storing approximate numerical data with floating points. It is used for calculations that require decimal precision. | 3.14 , -0.001 , 2.71828 |
VARCHAR(N) / CHARACTER VARYING(N) | Used for variable-length strings, such as names, descriptions, and text content. Can store strings with a maximum length of N .The name we use may depend on the RDBMS, but VARCHAR(N) is commonly supported in most major RDBMSs. | 'Hello' , 'SQL Query' |
CHAR(N) / CHARACTER(N) / BPCHAR(N) | Ideal for fixed-length strings, where every entry in the column has exactly N characters. Commonly used in cases where uniformity is required. If the string is shorter than N , it is padded with spaces.The precise name we use may depend on the RDBMS, but CHAR(N) is commonly supported across most major RDBMSs. | 'SQL ' (padded) |
TEXT | Used for storing large blocks of text. It is ideal for storing long strings, such as articles, descriptions, or any kind of large textual content. Unlike VARCHAR , TEXT has no predefined limit on length. | (Large paragraphs or articles) |
BOOLEAN | Represents a value that can be either true or false. Typically used in logical operations. | TRUE , FALSE |
DATE | Stores calendar dates in the format YYYY-MM-DD . Perfect for birth dates, event dates, etc. | 2023-08-26 |
TIME | Stores time information in HH:MM:SS format, used in scenarios like recording log times.Depending on the RDBMS and the exact data type, this could include fractional precision, with a milliseconds component. | 14:30:00 |
TIMESTAMP / DATETIME / DATETIME2(N) | Combines date and time, represented in YYYY-MM-DD HH:MM:SS format (sometimes with a larger fractional precision depending on the type/RDBMS). Used in scenarios like logging events with precise time stamps.The precise name we use may depend on the RDBMS, and the exact specifications may differ slightly. | 2023-08-26 14:30:00 |
INTERVAL | Represents a span of time, such as the difference between two dates or times. It is useful for operations that require time duration calculations. | '1 YEAR 6 MONTHS' , '2 HOURS 30 MINUTES' |
BLOB | Stores Binary Large Objects. Useful for storing large binary data such as images, multimedia files, or documents. | (Binary image data) |
DECIMAL(M,D) | Used for precise, fixed-point numbers where M is the maximum number of digits and D is the number of digits to the right of the decimal point. Commonly used in financial calculations where precision is critical. | 123.45 , -0.12 |
Not all RDBMSs support all data types. Also, some RDBMSs have different names for the same data type, such as seen in some of the above types.
For example, PostgreSQL supports the TIMESTAMP
type, while MySQL supports both TIMESTAMP
and DATETIME
. SQL Server on the other hand, supports DATETIME
and DATETIME2
(where DATETIME2
is basically like DATETIME
, except with a larger date range, a larger default fractional precision, and optional user-specified precision).
How to Use Data Types
Here’s an example of SQL code that creates a table with multiple columns with various data types:
CREATE TABLE Employees (
EmployeeID INTEGER PRIMARY KEY, -- Unique identifier for each employee
FirstName VARCHAR(50), -- First name of the employee
LastName VARCHAR(50), -- Last name of the employee
Email VARCHAR(100), -- Email address
BirthDate DATE, -- Date of birth
HireDate TIMESTAMP, -- Date and time when the employee was hired
Salary DECIMAL(10, 2), -- Salary with up to 10 digits, 2 of which are after the decimal point
IsActive BOOLEAN, -- Indicates if the employee is currently active
ProfilePhoto BLOB, -- Binary data for the employee's profile photo
Notes TEXT -- Long-form notes about the employee
);
That code will run in MySQL. Other RDBMSs may not support all data types or they may have different names for the data types. In any case, the syntax is generally the same – provide the column name followed by the data type.
Here’s a quick explanation of the table’s columns:
- EmployeeID: An
INTEGER
used as the primary key, uniquely identifying each employee. - FirstName: A
VARCHAR(50)
field to store the employee’s first name, with a maximum length of 50 characters. - LastName: A
VARCHAR(50)
field to store the employee’s last name, with a maximum length of 50 characters. - Email: A
VARCHAR(100)
field to store the employee’s email address, with a maximum length of 100 characters. - BirthDate: A
DATE
field to store the employee’s date of birth. - HireDate: A
TIMESTAMP
field to store the exact date and time when the employee was hired. - Salary: A
DECIMAL(10, 2)
field to store the employee’s salary, allowing for up to 10 digits in total, with 2 digits after the decimal point. - IsActive: A
BOOLEAN
field to indicate whether the employee is currently active (TRUE or FALSE). - ProfilePhoto: A
BLOB
field to store binary data for the employee’s profile photo. - Notes: A
TEXT
field to store long-form notes about the employee, such as performance reviews or personal details.
This table demonstrates the use of various SQL data types, showing how they can be combined to create a suitable database schema. We know that data will conform to the rules of each data type, and we can use additional techniques to tighten these rules even further if required (such as constraints, user-defined data types, etc).
Variables
When we declare a variable, we usually need to specify which data type to use. Here’s an example of declaring variables, then setting them:
-- Declare and set variables with specific data types
DECLARE @EmployeeID INT;
DECLARE @EmployeeName VARCHAR(50);
DECLARE @Salary DECIMAL(10, 2);
DECLARE @HireDate DATE;
DECLARE @IsActive BIT; -- BIT is often used for boolean values in SQL Server
-- Assign values to the variables
SET @EmployeeID = 101;
SET @EmployeeName = 'Butch Baddant';
SET @Salary = 55000.75;
SET @HireDate = '2023-08-26';
SET @IsActive = 1; -- 1 for TRUE, 0 for FALSE in BIT data type
-- Example of using the variables in a query
SELECT @EmployeeID AS ID, @EmployeeName AS Name, @Salary AS Salary, @HireDate AS HireDate, @IsActive AS Active;
This code uses the syntax supported in SQL Server. Other RDBMSs may use a different syntax for declaring and setting variables. Also, as seen here, when using SQL Server, we need to use the BIT
data type to represent BOOLEAN
values, due to SQL Server not having a BOOLEAN
type.
Consequences of Using the Wrong Data Type
Choosing the wrong data type can lead to a variety of issues:
- Data Truncation: If you try to insert a value that exceeds the defined data type’s capacity, it might get truncated. For instance, inserting
'Hello World!'
into aVARCHAR(5)
field would result in'Hello'
. - Precision Loss: Using
FLOAT
instead ofDECIMAL
for financial data could result in precision loss, leading to inaccurate calculations. - Performance Issues: Using a larger data type than necessary (e.g.,
TEXT
instead ofVARCHAR
) can lead to inefficient use of memory and slower query performance. - Data Incompatibility: Trying to insert a value that’s incompatible with the data type. For example, trying to insert
'Hello World!'
into anINTEGER
column would result in an error, as the data type expects a numeric value. - Logical Errors: Storing dates as
VARCHAR
instead ofDATE
can prevent proper date comparisons or calculations, leading to logical errors in the application.
Adding Constraints
While data types go a long way to restricting the type of data that can be entered into a database, they do have their limitations. The built-in data types provided with each RDBMS is designed to cater for most general scenarios. But sometimes we have scenarios where we need tighter rules on what can be entered into a column. For example, we might want a numeric value to conform to a certain format, such as a phone number. Or we may simply want to ensure that a column isn’t left blank. In such cases, we can use a constraint to help us out.
In SQL, constraints are additional rules applied to table columns to enforce data integrity and ensure that the data entered into the database meets specific requirements. Constraints further restrict the type of data that users can enter into a column, making the database more reliable and consistent.
Common Constraint Types
Here are some of the most commonly used constraints in SQL:
Constraint Type | Description | Example |
---|---|---|
PRIMARY KEY | Ensures that each value in a column (or a combination of columns) is unique and not null. It uniquely identifies each record in a table. | EmployeeID INTEGER PRIMARY KEY |
UNIQUE | Ensures that all values in a column are unique, preventing duplicate entries. | Email VARCHAR(100) UNIQUE |
NOT NULL | Ensures that a column cannot have a NULL value, meaning every record must include a value for this column. | LastName VARCHAR(50) NOT NULL |
CHECK | Ensures that all values in a column satisfy a specific condition. | Salary DECIMAL(10, 2) CHECK (Salary > 0) |
FOREIGN KEY | Ensures referential integrity by linking a value in one table to a value in another table. | DepartmentID INTEGER, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) |
Example: Creating a Table with Constraints
Let’s create the Employees
table again, but this time with additional constraints to enforce data integrity:
CREATE TABLE Employees (
EmployeeID INTEGER PRIMARY KEY, -- Unique identifier for each employee
FirstName VARCHAR(50) NOT NULL, -- First name of the employee (cannot be NULL)
LastName VARCHAR(50) NOT NULL, -- Last name of the employee (cannot be NULL)
Email VARCHAR(100) UNIQUE, -- Email address (must be unique)
BirthDate DATE, -- Date of birth
HireDate TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Default to the current date and time
Salary DECIMAL(10, 2) CHECK (Salary > 0), -- Salary must be positive
IsActive BOOLEAN DEFAULT TRUE, -- Indicates if the employee is currently active (default is TRUE)
ProfilePhoto BLOB, -- Binary data for the employee's profile photo
Notes TEXT -- Long-form notes about the employee
);
Explanation of constraints used in the example:
- EmployeeID: The
PRIMARY KEY
constraint ensures thatEmployeeID
is unique for each employee and cannot be null. It ensures that we will be able to identify each row in the table by this column, due to it always containing a unique value – no two rows will contain the same value in this column. - FirstName and LastName: The
NOT NULL
constraint ensures that these columns cannot have null values, meaning each employee must have a first and last name. - Email: The
UNIQUE
constraint ensures that no two employees can have the same email address. - HireDate: The
DEFAULT
constraint assigns the current timestamp to theHireDate
column if no value is provided. - Salary: The
CHECK
constraint ensures that theSalary
value is greater than zero, preventing the entry of negative or zero salaries. - IsActive: The
DEFAULT
constraint sets theIsActive
column toTRUE
by default, indicating that new employees are active unless specified otherwise.
Benefits of Using Constraints
- Data Integrity: Constraints help maintain the accuracy and consistency of data by enforcing rules that prevent incorrect or duplicate entries.
- Reliability: By using constraints, you reduce the likelihood of errors in your database, making it more reliable and trustworthy.
- Simplified Data Management: Constraints automate the enforcement of business rules, reducing the need for complex application logic.
Using constraints is a best practice in SQL database design, as they play a crucial role in maintaining a clean and accurate database schema.
User-Defined Data Types (UDTs)
SQL allows us to define our own data types, known as user-defined data types (UDTs). These are useful when we want to enforce consistency across multiple tables or when the existing data types don’t precisely meet our needs.
For example, you might create a user-defined type for storing phone numbers with a specific format:
CREATE DOMAIN phone_number AS VARCHAR(15)
CHECK (VALUE ~ '^\+?[0-9]{10,15}$');
Here, phone_number
is a UDT that ensures the stored value matches the pattern of a valid phone number.
Benefits of User-Defined Data Types
User-defined data types offer the following benefits:
- Consistency: By defining a UDT, we can ensure that the same format is be used across different tables (by using that UDT in those tables), reducing the risk of errors.
- Simplified Code: Reusing a UDT in various tables can make the schema more readable and maintainable.
- Custom Constraints: UDTs allow you to enforce specific constraints that might not be possible with standard data types.
Disadvantages
While UDTs are powerful, they can introduce challenges:
- Complexity: Overusing UDTs can make your database schema more complex and harder to understand.
- Compatibility Issues: Not all SQL database systems fully support UDTs, which could lead to compatibility issues when migrating databases.
Conclusion
Having a good understanding of how data types work in SQL is essential for designing efficient, reliable, and accurate databases.
Choosing the correct data type for your columns not only helps to maintain data integrity but also enhances the performance and maintainability of your database. These can be combined with constraints, which provide additional restrictions on the type of data that can be entered into each column.
And user-defined data types offer yet more flexibility, allowing you to tailor the data types to your specific needs.