In PostgreSQL, we can use the ascii()
function to get the numeric code of the first character of the argument.
When using UTF8, the function returns the Unicode code point of the character. Other multibyte encodings require an ASCII character.
Example
Here’s an example to demonstrate:
SELECT ascii('a');
Result:
97
As mentioned, it’s only concerned with the first character, so the following example returns the same result:
SELECT ascii('ant');
Result:
97
Normalized Unicode Characters
In many cases, Unicode allows multiple code point representations of what is essentially the same character. This means that we could get different results depending on how the Unicode character was constructed.
Here’s an example:
SELECT
ascii('ö'),
ascii('ö');
Result:
ascii | ascii
-------+-------
246 | 111
This character is known in Unicode as LATIN SMALL LETTER O WITH DIAERESIS. It can be represented as a single code point (U+00F6
) or as two separate code points (U+006F
and U+0308
). Regarding the second one, U+006F
is the code point for LATIN SMALL LETTER O and U+0308
is the code point for the COMBINING DIAERESIS. These two code points are combined to form the same character that the single code point U+00F6
would produce.
When creating the above example, I used the single code point for the left column and two separate code points for the right column. This resulted in different results when passing them to the ascii()
function.
We can use the normalize()
function to normalize these Unicode characters:
SELECT
ascii(normalize('ö')),
ascii(normalize('ö'));
Result:
ascii | ascii
-------+-------
246 | 246