Here are two options for getting a list of character sets that are available in MySQL.
The SHOW CHARACTER SET
Statement
The SHOW CHARACTER SET
statement is a quick and easy way to return all the character sets that are available in MySQL.
We can return all character sets by running the following:
SHOW CHARACTER SET;
Partial output:
+----------+---------------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+---------------------------------+---------------------+--------+ | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | binary | Binary pseudo charset | binary | 1 | ...
That returns all available collations, which can be quite a large list.
We can also filter the results with a LIKE
and/or WHERE
clause to return character sets that match a given criteria:
SHOW CHARACTER SET LIKE 'utf%';
Result:
+---------+------------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+------------------+--------------------+--------+ | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | utf8mb3 | UTF-8 Unicode | utf8mb3_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | +---------+------------------+--------------------+--------+
In this case, I narrowed the results to just those character sets that start with utf
.
Here’s another example where I use both the WHERE
clause and the LIKE
clause, as well as the OR
operator:
SHOW CHARACTER SET
WHERE Maxlen LIKE '4'
OR Description LIKE 'ISO%';
Result:
+---------+---------------------------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+---------------------------------+--------------------+--------+ | gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | +---------+---------------------------------+--------------------+--------+
The information_schema.character_sets
Table
We can also interrogate the information schema for information about character sets. The information_schema.character_sets
table contains a full list of supported character sets in MySQL.
Example:
SELECT *
FROM information_schema.CHARACTER_SETS
WHERE CHARACTER_SET_NAME LIKE 'utf%'
AND MAXLEN = 4;
Result:
+--------------------+----------------------+------------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+------------------+--------+ | utf16 | utf16_general_ci | UTF-16 Unicode | 4 | | utf16le | utf16le_general_ci | UTF-16LE Unicode | 4 | | utf32 | utf32_general_ci | UTF-32 Unicode | 4 | | utf8mb4 | utf8mb4_0900_ai_ci | UTF-8 Unicode | 4 | +--------------------+----------------------+------------------+--------+
We can see that it returns the same columns that SHOW CHARACTER SETS
returns.
When we run a query like this, we’re using the SELECT
statement which means we can customise our results in a large number of ways. We can also reduce the columns returned, join the table with other tables, etc.
Here’s an example of selecting less columns:
SELECT
CHARACTER_SET_NAME,
DESCRIPTION
FROM information_schema.CHARACTER_SETS
WHERE CHARACTER_SET_NAME LIKE 'utf%'
AND MAXLEN = 4;
Result:
+--------------------+------------------+ | CHARACTER_SET_NAME | DESCRIPTION | +--------------------+------------------+ | utf16 | UTF-16 Unicode | | utf16le | UTF-16LE Unicode | | utf32 | UTF-32 Unicode | | utf8mb4 | UTF-8 Unicode | +--------------------+------------------+