You can specify the character set and collation at various levels in MariaDB. You can specify them at the connection level, the server level, the database level, the table level, and even the column level.
You can also specify a collation in your queries so that it overrides any collation that has been previously specified at the aforementioned levels.
To set the character set and collation at the table level, use the CREATE TABLE
statement or ALTER TABLE
statement (depending on whether you’re creating the table or modifying it).
When Creating the Table
Here’s an example of setting the collation and character set when creating the table:
CREATE TABLE Products (
ProductId CHAR(10),
ProductName VARCHAR(255) NOT NULL,
PRIMARY KEY (ProductId))
CHARACTER SET latin1
COLLATE latin1_general_ci;
That creates a table called Products
with a latin1
character set, and a collation of latin1_general_ci
.
Alter an Existing Table
Here’s an example of changing the collation and character set on an existing table:
ALTER TABLE Products
CONVERT TO CHARACTER SET utf8 COLLATE utf8_spanish_ci;
It now uses the new collation and character set.
It’s also possible to omit the CONVERT TO
part:
ALTER TABLE Products
CHARACTER SET utf8
COLLATE utf8_spanish_ci;
Collation Must be Valid for the Character Set
When setting both the collation and the character set, you need to ensure that the collation is valid for the chosen character set and vice/versa. If not, an error is returned.
Here’s an example of specifying using an invalid collation for the specified character set:
ALTER TABLE Products
CHARACTER SET latin1
COLLATE utf8_spanish_ci;
Result:
ERROR 1253 (42000): COLLATION 'utf8_spanish_ci' is not valid for CHARACTER SET 'latin1'
It tells us that our chosen collation is not valid for our chosen specified character set.
See Full List of Collations Supported by MariaDB or Full List of Character Sets Supported by MariaDB for a list of matching collations and character sets.
Specifying Only the Collation or Character Set
It’s OK to specify just the collation or just the character set (or neither):
- If only the character set is provided, the default collation for that character set will be used.
- If only the collation is provided, the associated character set will be used.
- If neither collation nor character set is provided, the table default is used.
Converting to a Character Set that Requires More Bytes
For VARCHAR
or TEXT
columns, CONVERT TO CHARACTER SET
changes the data type if required to ensure that the new column is long enough to store as many characters as the original column.
For example, an ascii TEXT
column requires a single byte per character, so the column can hold up to 65,535 characters. If the column is converted to utf8, up to 3 bytes can be required for each character, so the column will be converted to MEDIUMTEXT
(which has a maximum length of 16,777,215) so that it will be able to hold the same number of characters.
If you don’t want such data type changes to take place, use MODIFY
on the individual columns instead.
For example:
ALTER TABLE Products
MODIFY ProductName VARCHAR(255)
CHARACTER SET utf8;
Converting to Binary
CONVERT TO CHARACTER SET binary
will convert CHAR
, VARCHAR
and TEXT
columns to BINARY
, VARBINARY
and BLOB
respectively.
Once this has been done, such columns will no longer have a character set. Also, they will no longer be affected by future CONVERT TO CHARACTER SET
statements.
Checking the Collation and Character Sets in MariaDB
There are various ways to check the collation and character sets in MariaDB, depending on the level that you’re checking (server, database, table, column, connection, etc). See Show the Collation in MariaDB for examples.