In PostgreSQL, we can use the regexp_substr()
function to return a substring from a string based on a POSIX regular expression.
We can get the first occurrence or any other subsequent occurrence that matches the expression.
Syntax
The syntax goes like this:
regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] )
Example
Here’s a basic example to demonstrate:
SELECT regexp_substr('dit dat dot dut', 'd.t');
Result:
dit
In this case there was at least one match and so the substring of the first match was returned.
Specify a Starting Point
We can provide a third argument to specify a starting position:
SELECT regexp_substr('dit dat dot dut', 'd.t', 4);
Result:
dat
Here, I specified that the search start at the fourth character. So the function started there and returned the first match after that point.
Return the Nth Match
By default, only the first match is returned (as evidenced in the previous examples). But we do have the option of specifying another match to return:
SELECT regexp_substr('dit dat dot dut', 'd.t', 1, 3);
Result:
dot
In this case I specified that only the third match should be returned. Given I specified a start position of 1
, this resulted in the third match from the start being returned. If we specify a different starting position, then it will be the third match from that position:
SELECT regexp_substr('dit dat dot dut', 'd.t', 2, 3);
Result:
dut
No Match
When there’s no match, null
is returned:
SELECT regexp_substr('dit dat dot dut', 'h.t');
Result:
null
Add a Flag
We can also add a flag to change how the function behaves. For example, we can specify that the search is case-insensitive by including the i
flag:
SELECT
regexp_substr('dit dat dot dut', 'D.T', 1, 1) AS "Case Sensitive",
regexp_substr('dit dat dot dut', 'D.T', 1, 1, 'i') AS "Case Insensitive";
Result:
Case Sensitive | Case Insensitive
----------------+------------------
null | dit
The i
flag is just one of many flags we can use. The following table contains the flags we can use here:
Option | Description |
---|---|
b | Rest of RE is a BRE |
c | Case-sensitive matching (overrides operator type) |
e | Rest of RE is an ERE |
i | Case-insensitive matching (overrides operator type) |
m | Historical synonym for n |
n | Newline-sensitive matching |
p | Partial newline-sensitive matching |
q | Rest of RE is a literal (“quoted”) string, all ordinary characters |
s | Non-newline-sensitive matching (default) |
t | Tight syntax (default) |
w | Inverse partial newline-sensitive matching |
x | Expanded syntax |
Source: https://www.postgresql.org/docs/current/functions-matching.html#POSIX-METASYNTAX
Specify a Subexpression
If our regular expression contains subexpressions, then we have the option of specifying a subexpression to use in the evaluation. We can do this by providing an integer that indicates which subexpression to use:
SELECT
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i') AS "Omitted",
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 0) AS "0",
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 1) AS "1",
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 2) AS "2",
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 3) AS "3",
regexp_substr('PostgreSQL', 'p(.)(..)(...)(..)', 1, 1, 'i', 4) AS "4";
Result:
Omitted | 0 | 1 | 2 | 3 | 4
-----------+-----------+---+----+-----+----
PostgreSQ | PostgreSQ | o | st | gre | SQ
More About POSIX Regular Expressions in PostgreSQL
See the PostgreSQL documentation for an overview of how POSIX regular expressions work and the various options available when using the regex_substr()
function.