In PostgreSQL, the regexp_split_to_array()
function splits a string using a POSIX regular expression as the delimiter, and returns the result in a text array.
So we use a POSIX regular expression to specify the delimiter/s, and split the string based on that.
We pass the string as the first argument and the pattern as the second. We can also specify a flag to determine how the function behaves.
Example
Here’s a basic example to demonstrate:
SELECT regexp_split_to_array('cat dog bird', '\s+');
Result:
{cat,dog,bird}
Here, I used \s+
as the regular expression (which is what determines the delimiter in the string at the first argument). This particular expression specifies that the space character is the delimiter.
Here are some more examples:
\x
SELECT
regexp_split_to_array('cat dog bird', '\W+') AS "1",
regexp_split_to_array('cat+dog+bird', '\W+') AS "2",
regexp_split_to_array('cat + dog + bird', '\W+') AS "3",
regexp_split_to_array('cat & dog & bird', '\W+') AS "4",
regexp_split_to_array('cat&dog+bird fox', '[&+ ]+') AS "5";
Result (using vertical output):
1 | {cat,dog,bird}
2 | {cat,dog,bird}
3 | {cat,dog,bird}
4 | {cat,dog,bird}
5 | {cat,dog,bird,fox}
No Match
If there are no matches, the original string is returned in an array:
SELECT regexp_split_to_array('cat dog bird', ',');
Result:
{"cat dog bird"}
Adding a Flag
We also have the option of adding a flag as a third argument. Such flags can be used to change the behaviour of the function. The following table contains the other flags we can use:
Option | Description |
---|---|
b | Rest of RE is a BRE |
c | Case-sensitive matching (overrides operator type) |
e | Rest of RE is an ERE |
i | Case-insensitive matching (overrides operator type) |
m | Historical synonym for n |
n | Newline-sensitive matching |
p | Partial newline-sensitive matching |
q | Rest of RE is a literal (“quoted”) string, all ordinary characters |
s | Non-newline-sensitive matching (default) |
t | Tight syntax (default) |
w | Inverse partial newline-sensitive matching |
x | Expanded syntax |
Source: https://www.postgresql.org/docs/current/functions-matching.html#POSIX-METASYNTAX
Here’s an example of using the i
flag to specify a case insensitive search:
SELECT
regexp_split_to_array('cat AND dog OR bird', '\s(and|or)\s') AS "Case Sensitive",
regexp_split_to_array('cat AND dog OR bird', '\s(and|or)\s', 'i') AS "Case Insensitive";
Result:
Case Sensitive | Case Insensitive
-------------------------+------------------
{"cat AND dog OR bird"} | {cat,dog,bird}
More About POSIX Regular Expressions in PostgreSQL
See the PostgreSQL documentation for an overview of how POSIX regular expressions work and the various options available when using the regex_split_to_array()
function.