Using REGEXP_SPLIT_TO_ARRAY() in PostgreSQL

In PostgreSQL, the regexp_split_to_array() function splits a string using a POSIX regular expression as the delimiter, and returns the result in a text array.

So we use a POSIX regular expression to specify the delimiter/s, and split the string based on that.

We pass the string as the first argument and the pattern as the second. We can also specify a flag to determine how the function behaves.

Example

Here’s a basic example to demonstrate:

SELECT regexp_split_to_array('cat dog bird', '\s+');

Result:

{cat,dog,bird}

Here, I used \s+ as the regular expression (which is what determines the delimiter in the string at the first argument). This particular expression specifies that the space character is the delimiter.

Here are some more examples:

\x
SELECT 
    regexp_split_to_array('cat dog bird', '\W+') AS "1",
    regexp_split_to_array('cat+dog+bird', '\W+') AS "2",
    regexp_split_to_array('cat + dog + bird', '\W+') AS "3",
    regexp_split_to_array('cat & dog & bird', '\W+') AS "4",
    regexp_split_to_array('cat&dog+bird fox', '[&+ ]+') AS "5";

Result (using vertical output):

1 | {cat,dog,bird}
2 | {cat,dog,bird}
3 | {cat,dog,bird}
4 | {cat,dog,bird}
5 | {cat,dog,bird,fox}

No Match

If there are no matches, the original string is returned in an array:

SELECT regexp_split_to_array('cat dog bird', ',');

Result:

{"cat dog bird"}

Adding a Flag

We also have the option of adding a flag as a third argument. Such flags can be used to change the behaviour of the function. The following table contains the other flags we can use:

OptionDescription
bRest of RE is a BRE
cCase-sensitive matching (overrides operator type)
eRest of RE is an ERE
iCase-insensitive matching (overrides operator type)
mHistorical synonym for n
nNewline-sensitive matching
pPartial newline-sensitive matching
qRest of RE is a literal (“quoted”) string, all ordinary characters
sNon-newline-sensitive matching (default)
tTight syntax (default)
wInverse partial newline-sensitive matching
xExpanded syntax

Source: https://www.postgresql.org/docs/current/functions-matching.html#POSIX-METASYNTAX

Here’s an example of using the i flag to specify a case insensitive search:

SELECT 
    regexp_split_to_array('cat AND dog OR bird', '\s(and|or)\s') AS "Case Sensitive",
    regexp_split_to_array('cat AND dog OR bird', '\s(and|or)\s', 'i') AS "Case Insensitive";

Result:

     Case Sensitive      | Case Insensitive 
-------------------------+------------------
{"cat AND dog OR bird"} | {cat,dog,bird}

More About POSIX Regular Expressions in PostgreSQL

See the PostgreSQL documentation for an overview of how POSIX regular expressions work and the various options available when using the regex_split_to_array() function.