REGEXP_INSTR() Function in Oracle

In Oracle, the REGEXP_INSTR() function searches a string for a regular expression pattern. It returns an integer indicating the beginning or ending position of the matched substring (whichever one you specify).

It extends the functionality of the INSTR() function by allowing us to use regular expression patterns.

Syntax

The syntax goes like this:

REGEXP_INSTR ( source_char, pattern
               [, position
                  [, occurrence
                     [, return_opt
                        [, match_param
                           [, subexpr ]
                        ]
                     ]
                  ]
               ]
             )

Where:

  • source_char is a character expression that serves as the search value.
  • pattern is the regular expression.
  • position is a positive integer that specifies where to begin the search. The default is 1, meaning, start the search at the first character.
  • occurrence is a positive integer that specifies which occurrence to search for. The default is 1, which means searches for the first occurrence.
  • return_opt specifies whether Oracle should return the beginning or ending position of the matched substring. Use 0 for the beginning, and 1 for the ending. The default value is 0.
  • match_param lets you change the default matching behaviour of the function. For example, it allows you to specify case-sensitivity, how multiple lines and spaces are dealt with, etc. This argument works the same as when used with the REGEXP_COUNT() function. See Oracle’s documentation for that function for more information.
  • For a pattern with subexpressions, subexpr is a nonnegative integer from 0 to 9 indicating which subexpression in pattern is to be returned by the function. This argument works the same as when used with the REGEXP_INSTR() function. See Oracle’s documentation for that function more information.

Example

Here’s a basic example of using REGEXP_INSTR() in Oracle:

SELECT 
    REGEXP_INSTR('My dogs are fluffy', 'd.g')
FROM DUAL;

Result:

4

In this case there’s a match, and the beginning position of the substring is returned.

Regular expressions can be very powerful, and this example uses a very simple example. In order to use REGEXP_INSTR() effectively, you’ll need to know the correct pattern to use for the desired outcome.

No Match

Here’s an example where there’s no match:

SELECT REGEXP_INSTR('My dogs like dregs', 't.g')
FROM DUAL;

Result:

0

There’s no match, so 0 is returned.

Multiple Matches

Here’s an example with multiple matches:

SELECT 
    REGEXP_INSTR('My dogs have dags', 'd.g')
FROM DUAL;

Result:

4

It returned the position of the first occurrence.

However, you can specify which occurrence to replace:

SELECT 
REGEXP_INSTR('My dogs have dags', 'd.g', 1, 2)
FROM DUAL;

Result:

14

Note that I added two arguments here; 1 and 2. The 1 specifies whereabouts in the string to start the search (in this case, at the first character). The 2 is what specifies which occurrence to search for. In this case, the second occurrence is searched for.

Here’s what happens if I start the search after the first occurrence:

SELECT 
REGEXP_INSTR('My dogs have dags', 'd.g', 8, 2)
FROM DUAL;

Result:

0

In this case there’s no match, because there’s only one more occurrence after the starting position.

If I change the last argument to 1, then we get a match (because it’s the first occurrence after the specified starting position):

SELECT 
REGEXP_INSTR('My dogs have dags', 'd.g', 8, 1)
FROM DUAL;

Result:

14

Return the End Position

You can pass a fifth argument of either 0 or 1 to specify whether the function should return the beginning or end position of the substring.

The default value is 0 (for the beginning position). Here’s what happens if we specify 1:

SELECT 
    REGEXP_INSTR('My dogs are fluffy', 'd.g', 1, 1, 1)
FROM DUAL;

Result:

7

Just to be clear, here it is again when compared with 0:

SELECT 
    REGEXP_INSTR('My dogs are fluffy', 'd.g', 1, 1, 0) AS "Start",
    REGEXP_INSTR('My dogs are fluffy', 'd.g', 1, 1, 1) AS "End"
FROM DUAL;

Result:

   Start    End 
________ ______ 
       4      7

Case Sensitivity

The REGEXP_INSTR() function follows Oracle’s collation determination and derivation rules, which define the collation to use when matching the string with the pattern.

However, you can explicitly specify case-sensitivity with the optional sixth argument. When you do this, it overrides any case-sensitivity or accent-sensitivity of the determined collation.

You can specify i for case-insensitive matching and c for case-sensitive matching.

Here’s an example:

SELECT 
    REGEXP_INSTR('My Cats', 'c.t', 1, 1, 0) AS "Default",
    REGEXP_INSTR('My Cats', 'c.t', 1, 1, 0, 'i') AS "Case Insensitive",
    REGEXP_INSTR('My Cats', 'c.t', 1, 1, 0, 'c') AS "Case Sensitive"
FROM DUAL;

Result:

   Default    Case Insensitive    Case Sensitive 
__________ ___________________ _________________ 
         0                   4                 0

My collation appears to be case-sensitive, based on these results. The other two strings were forced to a case-insensitive and case-sensitive matching respectively.

Subexpressions

Here’s an example of using the sixth argument to return a specific subexpression pattern:

SELECT REGEXP_INSTR(
    'catdogcow', 
    '(c.t)(d.g)(c.w)', 
    1, 1, 0, 'i', 1
    )
FROM DUAL;

Result:

1

In this case I returned the first subexpression.

Here’s what happens if I specify the third subexpression:

SELECT REGEXP_INSTR(
    'catdogcow', 
    '(c.t)(d.g)(c.w)', 
    1, 1, 0, 'i', 3
    )
FROM DUAL;

Result:

7

Null Arguments

With the exception of the 6th argument, providing null for an argument results in null:

SET NULL 'null';
SELECT 
    REGEXP_INSTR(null, 'c.t', 1, 1, 0, 'i', 1) AS "1",
    REGEXP_INSTR('Cat', null, 1, 1, 0, 'i', 1) AS "2",
    REGEXP_INSTR('Cat', 'c.t', null, 1, 0, 'i', 1) AS "3",
    REGEXP_INSTR('Cat', 'c.t', 1, null, 0, 'i', 1) AS "4",
    REGEXP_INSTR('Cat', 'c.t', 1, 1, null, 'i', 1) AS "5",
    REGEXP_INSTR('Cat', 'c.t', 1, 1, 0, null, 1) AS "6",
    REGEXP_INSTR('Cat', 'c.t', 1, 1, 0, 'i', null) AS "7"
FROM DUAL;

Result:

      1       2       3       4       5    6       7 
_______ _______ _______ _______ _______ ____ _______ 
   null    null    null    null    null    0    null 

By default, SQLcl and SQL*Plus return a blank space whenever null occurs as a result of a SQL SELECT statement.

However, you can use SET NULL to specify a different string to be returned. Here I specified that the string null should be returned.

Wrong Number of Arguments

Passing no arguments to the function, or too few, results in an error:

SELECT REGEXP_INSTR()
FROM DUAL;

Result:

Error starting at line : 1 in command -
SELECT REGEXP_INSTR()
FROM DUAL
Error at Command Line : 1 Column : 8
Error report -
SQL Error: ORA-00938: not enough arguments for function
00938. 00000 -  "not enough arguments for function"
*Cause:    
*Action:

The same applies when we pass too many arguments:

SELECT REGEXP_INSTR('Cat', 'c.t', 1, 1, 1, 'i', 1, 'oops')
FROM DUAL;

Result:

Error starting at line : 1 in command -
SELECT REGEXP_INSTR('Cat', 'c.t', 1, 1, 1, 'i', 1, 'oops')
FROM DUAL
Error at Command Line : 1 Column : 8
Error report -
SQL Error: ORA-00939: too many arguments for function
00939. 00000 -  "too many arguments for function"
*Cause:    
*Action:

More Information

The REGEXP_INSTR() function (as well as Oracle’s other implementation of regular expressions) conforms with the IEEE Portable Operating System Interface (POSIX) regular expression standard and to the Unicode Regular Expression Guidelines of the Unicode Consortium.

See the Oracle documentation for more information and examples of the REGEXP_INSTR() function.