The SERIAL
data type in PostgreSQL is a pseudo-type used to create an auto-incrementing sequence of integers for a column. It is commonly used for primary keys, as it eliminates the need to manually assign unique identifiers for each new record. PostgreSQL handles this by automatically creating a sequence object that supplies a unique number each time a new row is inserted.
PostgreSQL provides three SERIAL
types, each with two options for usage; which effectively equates to six different types. So basically, we have six options to choose from when creating a SERIAL
column.
Comparing the Different SERIAL
Types
The following table provides a breakdown of the six different options we have when creating a SERIAL
column:
SERIAL Type | Storage Size | Range | Use Case |
---|---|---|---|
smallserial serial2 | 2 bytes | 1 to 32,767 | Small datasets, limited records |
serial serial4 | 4 bytes | 1 to 2147483647 | General-purpose, moderate range |
bigserial serial8 | 8 bytes | 1 to 9223372036854775807 | Large datasets, high-volume tables |
So you can see that each SERIAL
type has a corresponding serialn
version, where n
represents the number of bytes in that option. We have the option of specifying one or the other when creating a SERIAL
column. For example, we could use bigserial
or we could use serial8
to do the same thing.
Corresponding Data Types
The SERIAL
types aren’t actually true data types. Rather, they’re a notational convenience for creating unique identifier columns. Each of these SERIAL
“types” has a corresponding (actual) integer data type, as outlined in the following table:
Type Name | Creates the Following Data Type |
---|---|
smallserial serial2 | smallint |
serial serial4 | integer |
bigserial serial8 | bigint |
The flexibility of being able to specify the exact type allows us to optimize storage depending on the expected volume of data.
Use Cases for Different SERIAL
Types
If you’re going to use the SERIAL
type, it’s important to choose the right one:
smallserial
/serial2
: Suitable for static lookup tables or small reference tables where the number of entries is limited.serial
/serial4
: For moderate-scale applications where a few million rows are expected. Common in most business applications.bigserial
/serial8
: Ideal for large-scale applications, such as analytics platforms, that handle billions of rows over time.
Choosing the wrong type could have a serious impact on your database. For example, if you choose smallserial
, but your table needs more than 32,767 rows, then you’ll run into trouble once it tries to insert the 32,768th row.
Conversely, if you choose bigserial
but you need no more than 2,147,483,647 rows, then you’ll be using up more space than you need.
How SERIAL
Works Behind the Scenes
When you define a column as SERIAL
, PostgreSQL automatically creates a sequence for that column. A sequence is a special kind of database object designed to generate unique numeric values. Each time a new record is inserted, PostgreSQL fetches the next number in the sequence.
For example, if we create a table like this:
CREATE TABLE orders (
order_id SERIAL
);
Behind the scenes, something like this will happen:
CREATE SEQUENCE orders_order_id_seq;
ALTER TABLE orders ALTER COLUMN order_id SET DEFAULT NEXTVAL('orders_order_id_seq');
The sequence object (orders_order_id_seq
) keeps track of the current value, and the NEXTVAL()
function retrieves the next number. This process is transparent to the user.
And the table’s definition will end up looking something like this:
CREATE TABLE IF NOT EXISTS public.orders
(
order_id integer NOT NULL DEFAULT nextval('orders_order_id_seq'::regclass)
)
Alternative to SERIAL
: Identity Columns
Since PostgreSQL 10, an alternative to SERIAL
columns is the IDENTITY
column. This feature follows the SQL standard more closely and can be used in place of SERIAL
.
Example:
CREATE TABLE users (
user_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
username VARCHAR(50)
);
Identity columns provide more control, allowing users to define whether the system should always generate the value (ALWAYS
) or only when no value is provided (BY DEFAULT
).
Like with SERIAL
columns, identity columns have an implicit sequence attached to them.