Understanding PostgreSQL’s UNISTR() Function

PostgreSQL has a unistr() function that evaluates escaped Unicode characters in its argument and returns the result as text.

Example

Here’s an example to demonstrate:

SELECT unistr('\0063\0061\0066\00E9');

Result:

café

Unicode characters can be specified as \XXXX (4 hexadecimal digits), \+XXXXXX (6 hexadecimal digits), \uXXXX (4 hexadecimal digits), or \UXXXXXXXX (8 hexadecimal digits).

Therefore, we can do any of the following to get the same result as the above example:

SELECT 
    unistr('\0063\0061\0066\00E9') AS "\XXXX",
    unistr('\+000063\+000061\+000066\+0000E9') AS "+XXXXXX",
    unistr('\u0063\u0061\u0066\u00E9') AS "\uXXXX",
    unistr('\U00000063\U00000061\U00000066\U000000E9') AS "\UXXXXXXXX";

Result:

 \XXXX | +XXXXXX | \uXXXX | \UXXXXXXXX 
-------+---------+--------+------------
café | café | café | café

If we want to include a backslash, we can escape it with another backslash. In other words, to include a backslash use two backslashes.

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding. If it’s not possible to do that, then an error occurs.

Null Argument

If the argument is null then null is returned:

SELECT unistr(null);

Result:

null

Wrong Argument Type

Passing an argument of the wrong type results in an error:

SELECT unistr(123);

Result:

ERROR:  function unistr(integer) does not exist
LINE 1: SELECT unistr(123);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.