A Quick Look at the ASCII() Function in PostgreSQL

In PostgreSQL, we can use the ascii() function to get the numeric code of the first character of the argument.

When using UTF8, the function returns the Unicode code point of the character. Other multibyte encodings require an ASCII character.

Example

Here’s an example to demonstrate:

SELECT ascii('a');

Result:

As mentioned, it’s only concerned with the first character, so the following example returns the same result:

SELECT ascii('ant');

Result:

Normalized Unicode Characters

In many cases, Unicode allows multiple code point representations of what is essentially the same character. This means that we could get different results depending on how the Unicode character was constructed.

Here’s an example:

SELECT 
    ascii('ö'),
    ascii('ö');

Result:

 ascii | ascii 
-------+-------
   246 |   111

This character is known in Unicode as LATIN SMALL LETTER O WITH DIAERESIS. It can be represented as a single code point (U+00F6) or as two separate code points (U+006F and U+0308). Regarding the second one, U+006F is the code point for LATIN SMALL LETTER O and U+0308 is the code point for the COMBINING DIAERESIS. These two code points are combined to form the same character that the single code point U+00F6 would produce.

When creating the above example, I used the single code point for the left column and two separate code points for the right column. This resulted in different results when passing them to the ascii() function.

We can use the normalize() function to normalize these Unicode characters:

SELECT
    ascii(normalize('ö')),
    ascii(normalize('ö'));

Result:

 ascii | ascii 
-------+-------
   246 |   246