A Quick Overview of PostgreSQL’s REGEXP_SUBSTR() Function

In PostgreSQL, we can use the regexp_substr() function to return a substring from a string based on a POSIX regular expression.

We can get the first occurrence or any other subsequent occurrence that matches the expression.

Syntax

The syntax goes like this:

regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] )

Example

Here’s a basic example to demonstrate:

SELECT regexp_substr('dit dat dot dut', 'd.t');

Result:

dit

In this case there was at least one match and so the substring of the first match was returned.

Specify a Starting Point

We can provide a third argument to specify a starting position:

SELECT regexp_substr('dit dat dot dut', 'd.t', 4);

Result:

dat

Here, I specified that the search start at the fourth character. So the function started there and returned the first match after that point.

Return the Nth Match

By default, only the first match is returned (as evidenced in the previous examples). But we do have the option of specifying another match to return:

SELECT regexp_substr('dit dat dot dut', 'd.t', 1, 3);

Result:

dot

In this case I specified that only the third match should be returned. Given I specified a start position of 1, this resulted in the third match from the start being returned. If we specify a different starting position, then it will be the third match from that position:

SELECT regexp_substr('dit dat dot dut', 'd.t', 2, 3);

Result:

dut

No Match

When there’s no match, null is returned:

SELECT regexp_substr('dit dat dot dut', 'h.t');

Result:

null

Add a Flag

We can also add a flag to change how the function behaves. For example, we can specify that the search is case-insensitive by including the i flag:

SELECT 
    regexp_substr('dit dat dot dut', 'D.T', 1, 1) AS "Case Sensitive",
    regexp_substr('dit dat dot dut', 'D.T', 1, 1, 'i') AS "Case Insensitive";

Result:

 Case Sensitive | Case Insensitive 
----------------+------------------
null | dit

The i flag is just one of many flags we can use. The following table contains the flags we can use here:

OptionDescription
bRest of RE is a BRE
cCase-sensitive matching (overrides operator type)
eRest of RE is an ERE
iCase-insensitive matching (overrides operator type)
mHistorical synonym for n
nNewline-sensitive matching
pPartial newline-sensitive matching
qRest of RE is a literal (“quoted”) string, all ordinary characters
sNon-newline-sensitive matching (default)
tTight syntax (default)
wInverse partial newline-sensitive matching
xExpanded syntax

Source: https://www.postgresql.org/docs/current/functions-matching.html#POSIX-METASYNTAX

Specify a Subexpression

If our regular expression contains subexpressions, then we have the option of specifying a subexpression to use in the evaluation. We can do this by providing an integer that indicates which subexpression to use:

SELECT 
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i') AS "Omitted",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 0) AS "0",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 1) AS "1",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 2) AS "2",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 3) AS "3",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 4) AS "4";

Result:

  Omitted  |     0     | 1 | 2  |  3  | 4  
-----------+-----------+---+----+-----+----
PostgreSQ | PostgreSQ | o | st | gre | SQ

More About POSIX Regular Expressions in PostgreSQL

See the PostgreSQL documentation for an overview of how POSIX regular expressions work and the various options available when using the regex_substr() function.