In Oracle, the REGEXP_SUBSTR()
function returns a substring from a string, based on a regular expression pattern.
It extends the functionality of the SUBSTR()
function by allowing us to use regular expression patterns.
Syntax
The syntax goes like this:
REGEXP_SUBSTR ( source_char, pattern
[, position
[, occurrence
[, match_param
[, subexpr ]
]
]
]
)
Where:
source_char
is a character expression that serves as the search value.pattern
is the regular expression.position
is a positive integer that specifies where to begin the search. The default is1
, meaning, start the search at the first character.occurrence
is a nonnegative integer that specifies which occurrence to search for. The default is1
, which means searches for the first occurrence.match_param
lets you change the default matching behaviour of the function. For example, it allows you to specify case-sensitivity, how multiple lines and spaces are dealt with, etc. This argument works the same as when used with theREGEXP_COUNT()
function. See Oracle’s documentation for that function for more information.- For a
pattern
with subexpressions,subexpr
is a nonnegative integer from 0 to 9 indicating which subexpression inpattern
is to be returned by the function. This argument works the same as when used with theREGEXP_INSTR()
function. See Oracle’s documentation for that function more information.
Example
Here’s a basic example of using REGEXP_SUBSTR()
in Oracle:
SELECT
REGEXP_SUBSTR('Cats and dogs', 'd.g')
FROM DUAL;
Result:
dog
In this case there’s a match, and the first (and in this case, only) matching substring is returned.
Regular expressions can be very powerful, and this example uses a very simple example. In order to use REGEXP_SUBSTR()
effectively, you’ll need to know the correct pattern to use for the desired outcome.
No Match
Here’s an example where there’s no match:
SET NULL 'null';
SELECT REGEXP_SUBSTR('My dogs like dregs', 't.g')
FROM DUAL;
Result:
null
There’s no match, so null
is returned.
By default, SQLcl and SQL*Plus return a blank space whenever null
occurs as a result of a SQL SELECT
statement.
However, you can use SET NULL
to specify a different string to be returned. Here I specified that the string null
should be returned.
Multiple Matches
Here’s an example with multiple matches:
SELECT
REGEXP_SUBSTR('My dogs have dags', 'd.g')
FROM DUAL;
Result:
dog
However, you can specify which occurrence to replace if required:
SELECT
REGEXP_SUBSTR('My dogs have dags', 'd.g', 1, 2)
FROM DUAL;
Result:
dag
Note that I added two arguments here; 1
and 2
. The 1
specifies whereabouts in the string to start the search (in this case, at the first character). The 2
is what specifies which occurrence to search for. In this case, the second occurrence is searched for.
Here’s what happens if I start the search after the first occurrence:
SELECT
REGEXP_SUBSTR('My dogs have dags', 'd.g', 8, 2)
FROM DUAL;
Result:
null
In this case there’s no match, because there’s only one more occurrence after the starting position.
If I change the last argument to 1
, then we get a match (because it’s the first occurrence after the specified starting position):
SELECT
REGEXP_SUBSTR('My dogs have dags', 'd.g', 8, 1)
FROM DUAL;
Result:
dag
Case Sensitivity
The REGEXP_SUBSTR()
function follows Oracle’s collation determination and derivation rules, which define the collation to use when matching the string with the pattern.
However, you can explicitly specify case-sensitivity with the optional fifth argument. When you do this, it overrides any case-sensitivity or accent-sensitivity of the determined collation.
You can specify i
for case-insensitive matching and c
for case-sensitive matching.
Here’s an example:
SELECT
REGEXP_SUBSTR('My Cats', 'c.t', 1, 1) AS "Default",
REGEXP_SUBSTR('My Cats', 'c.t', 1, 1, 'i') AS "Case Insensitive",
REGEXP_SUBSTR('My Cats', 'c.t', 1, 1, 'c') AS "Case Sensitive"
FROM DUAL;
Result:
Default Case Insensitive Case Sensitive __________ ___________________ _________________ null Cat null
My collation appears to be case-sensitive, based on these results. The other two strings were forced to a case-insensitive and case-sensitive matching respectively.
Subexpressions
Here’s an example of using the sixth argument to return a specific subexpression pattern:
SELECT REGEXP_SUBSTR(
'catdogcow',
'(c.t)(d.g)(c.w)',
1, 1, 'i', 1
)
FROM DUAL;
Result:
cat
In this case I returned the first subexpression.
Here’s what happens if I specify the third subexpression:
SELECT REGEXP_SUBSTR(
'catdogcow',
'(c.t)(d.g)(c.w)',
1, 1, 'i', 3
)
FROM DUAL;
Result:
cow
Null Arguments
If any argument is null
, the result is null
:
SET NULL 'null';
SELECT
REGEXP_SUBSTR(null, 'c.t', 1, 1, 'i', 1) AS "1",
REGEXP_SUBSTR('Cat', null, 1, 1, 'i', 1) AS "2",
REGEXP_SUBSTR('Cat', 'c.t', null, 1, 'i', 1) AS "3",
REGEXP_SUBSTR('Cat', 'c.t', 1, null, 'i', 1) AS "4",
REGEXP_SUBSTR('Cat', 'c.t', 1, 1, null, 1) AS "5",
REGEXP_SUBSTR('Cat', 'c.t', 1, 1, 'i', null) AS "6"
FROM DUAL;
Result:
1 2 3 4 5 6 _______ _______ _______ _______ _______ _______ null null null null null null
Wrong Number of Arguments
Passing no arguments to the function, or too few, results in an error:
SELECT REGEXP_SUBSTR()
FROM DUAL;
Result:
Error starting at line : 1 in command - SELECT REGEXP_SUBSTR() FROM DUAL Error at Command Line : 1 Column : 8 Error report - SQL Error: ORA-00938: not enough arguments for function 00938. 00000 - "not enough arguments for function" *Cause: *Action:
The same applies when we pass too many arguments:
SELECT REGEXP_SUBSTR('Cat', 'c.t', 1, 1, 'i', 1, 'oops')
FROM DUAL;
Result:
Error starting at line : 1 in command - SELECT REGEXP_SUBSTR('Cat', 'c.t', 1, 1, 'i', 1, 'oops') FROM DUAL Error at Command Line : 1 Column : 8 Error report - SQL Error: ORA-00939: too many arguments for function 00939. 00000 - "too many arguments for function" *Cause: *Action:
More Information
The REGEXP_SUBSTR()
function (as well as Oracle’s other implementation of regular expressions) conforms with the IEEE Portable Operating System Interface (POSIX) regular expression standard and to the Unicode Regular Expression Guidelines of the Unicode Consortium.
See the Oracle documentation for more information and examples of the REGEXP_SUBSTR()
function.