A Quick Overview of PostgreSQL’s REGEXP

In PostgreSQL, we can use the regexp_substr() function to return a substring from a string based on a POSIX regular expression.

We can get the first occurrence or any other subsequent occurrence that matches the expression.

Syntax

The syntax goes like this:

regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] )

Example

Here’s a basic example to demonstrate:

SELECT regexp_substr('dit dat dot dut', 'd.t');

Result:

dit

In this case there was at least one match and so the substring of the first match was returned.

Specify a Starting Point

We can provide a third argument to specify a starting position:

SELECT regexp_substr('dit dat dot dut', 'd.t', 4);

Result:

dat

Here, I specified that the search start at the fourth character. So the function started there and returned the first match after that point.

Return the Nth Match

By default, only the first match is returned (as evidenced in the previous examples). But we do have the option of specifying another match to return:

SELECT regexp_substr('dit dat dot dut', 'd.t', 1, 3);

Result:

dot

In this case I specified that only the third match should be returned. Given I specified a start position of 1, this resulted in the third match from the start being returned. If we specify a different starting position, then it will be the third match from that position:

SELECT regexp_substr('dit dat dot dut', 'd.t', 2, 3);

Result:

dut

No Match

When there’s no match, null is returned:

SELECT regexp_substr('dit dat dot dut', 'h.t');

Result:

null

Add a Flag

We can also add a flag to change how the function behaves. For example, we can specify that the search is case-insensitive by including the i flag:

SELECT 
    regexp_substr('dit dat dot dut', 'D.T', 1, 1) AS "Case Sensitive",
    regexp_substr('dit dat dot dut', 'D.T', 1, 1, 'i') AS "Case Insensitive";

Result:

 Case Sensitive | Case Insensitive 
----------------+------------------
 null           | dit

The i flag is just one of many flags we can use. The following table contains the flags we can use here:

Option	Description
`b`	Rest of RE is a BRE
`c`	Case-sensitive matching (overrides operator type)
`e`	Rest of RE is an ERE
`i`	Case-insensitive matching (overrides operator type)
`m`	Historical synonym for `n`
`n`	Newline-sensitive matching
`p`	Partial newline-sensitive matching
`q`	Rest of RE is a literal (“quoted”) string, all ordinary characters
`s`	Non-newline-sensitive matching (default)
`t`	Tight syntax (default)
`w`	Inverse partial newline-sensitive matching
`x`	Expanded syntax

Source: https://www.postgresql.org/docs/current/functions-matching.html#POSIX-METASYNTAX

Specify a Subexpression

If our regular expression contains subexpressions, then we have the option of specifying a subexpression to use in the evaluation. We can do this by providing an integer that indicates which subexpression to use:

SELECT 
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i') AS "Omitted",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 0) AS "0",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 1) AS "1",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 2) AS "2",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 3) AS "3",
    regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 4) AS "4";

Result:

  Omitted  |     0     | 1 | 2  |  3  | 4  
-----------+-----------+---+----+-----+----
 PostgreSQ | PostgreSQ | o | st | gre | SQ

Database.Guide

A Quick Overview of PostgreSQL’s REGEXP_SUBSTR() Function

Syntax

Example

Specify a Starting Point

Return the Nth Match

No Match

Add a Flag

Specify a Subexpression

More About POSIX Regular Expressions in PostgreSQL