In MySQL, the REGEXP_INSTR()
function returns the starting index of a substring that matches the regular expression pattern.
The index starts at 1
. If there’s no match, the result is 0
.
Syntax
The syntax goes like this:
REGEXP_INSTR(expr, pat[, pos[, occurrence[, return_option[, match_type]]]])
Where expr
is the input string and pat
is the regular expression pattern for the substring.
The optional pos
argument allows you to specify a position within the string to start the search. If omitted, it starts at position 1.
The optional occurrence
argument allows you to specify which occurrence of the match to search for. If omitted, the first occurrence is used (occurrence 1).
The optional return_option
argument allows you to specify which type of position to return. If you use 0
, it will return the position of the first character in the matching substring. If you use 1
it returns the position of the first character following the matching substring. If omitted, the value is 0
.
The optional match_type
argument allows you to refine the regular expression. For example, you can use this argument to specify case-sensitive matching or not.
Example 1 – Basic Usage
Here’s a basic example:
SELECT REGEXP_INSTR('Cat', 'at') Result;
Result:
+--------+ | Result | +--------+ | 2 | +--------+
In this case there’s a match, and the substring starts at position 2.
Example 2 – No Match
Here’s an example where there’s no match:
SELECT REGEXP_INSTR('Cat', '^at') Result;
Result:
+--------+ | Result | +--------+ | 0 | +--------+
There’s no match, so the result is 0
. There’s no match because I specified that the string must start with the substring.
Let’s change it so that it does start with that substring:
SELECT REGEXP_INSTR('at', '^at') Result;
Result:
+--------+ | Result | +--------+ | 1 | +--------+
Example 3 – The pos
Argument
Here’s an example of specifying a starting position:
SELECT REGEXP_INSTR('Cat Cat', 'Cat', 2) Result;
Result:
+--------+ | Result | +--------+ | 5 | +--------+
So we get the index of the second occurrence.
Note that the index still starts counting from position 1 regardless of where you specify the starting position.
The following example demonstrates this more clearly:
SELECT REGEXP_INSTR('Cat Cat', 'Cat', 2) AS 'Pos 2', REGEXP_INSTR('Cat Cat', 'Cat', 3) AS 'Pos 3', REGEXP_INSTR('Cat Cat', 'Cat', 5) AS 'Pos 5';
Result:
+-------+-------+-------+ | Pos 2 | Pos 3 | Pos 5 | +-------+-------+-------+ | 5 | 5 | 5 | +-------+-------+-------+
Of course, depending on your regex pattern, this can return the index of completely different substrings. Example:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1) 'Pos 1', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2) 'Pos 2', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+ | Pos 1 | Pos 2 | Pos 6 | +-------+-------+-------+ | 1 | 5 | 16 | +-------+-------+-------+
We can check the substrings with the REGEXP_SUBSTR()
function:
SELECT REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 1) 'Pos 1', REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 2) 'Pos 2', REGEXP_SUBSTR('Cat City is SO Cute!', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+ | Pos 1 | Pos 2 | Pos 6 | +-------+-------+-------+ | Cat | Cit | Cut | +-------+-------+-------+
Example 4 – The occurrence
Argument
Here’s an example of using the occurrence
argument. In all cases, we start at position 1:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1) 'Occurrence 1', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2) 'Occurrence 2', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 1 | 5 | 16 | +--------------+--------------+--------------+
However, if we start at a different position, the result is different:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 1) 'Occurrence 1', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 2) 'Occurrence 2', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 2, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 5 | 16 | 0 | +--------------+--------------+--------------+
This happened because our starting position came after the first occurrence had started. Therefore, occurrence 2 became occurrence 1, and occurrence 3 became occurrence 2. And because there were no more occurrences, the result of occurrence 3 was negative (i.e. there was no occurrence 3).
Example 5 – The return_option
Argument
Here’s an example of using the return_option
argument:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 0) 'Option 0', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 1) 'Option 1';
Result:
+----------+----------+ | Option 0 | Option 1 | +----------+----------+ | 1 | 4 | +----------+----------+
Option 0 returned the matched substring’s first character. Option 1 returned the position following the matched substring.
Here’s what it looks like if we apply it to the previous example:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 0) 'Occurrence 1', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2, 0) 'Occurrence 2', REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3, 0) 'Occurrence 3' UNION ALL SELECT REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 1, 1), REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 2, 1), REGEXP_INSTR('Cat City is SO Cute!', 'C.t', 1, 3, 1);
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | 1 | 5 | 16 | | 4 | 8 | 19 | +--------------+--------------+--------------+
In this case we did one set using option 0, and another using option 1, then joined them together using UNION ALL
.
Example 6 – The match_type
Argument
You can provide an additional argument to determine the match type. This allows you to specify things like whether or not the match is case-sensitive, whether or not to include line terminators, etc.
Here’s an example of specifying a case-sensitive match and a case-insensitive match:
SELECT REGEXP_INSTR('Cat City is SO Cute!', 'c.t', 1, 1, 0, 'c') 'Case-Sensitive', REGEXP_INSTR('Cat City is SO Cute!', 'c.t', 1, 1, 0, 'i') 'Case-Insensitive';
Result:
+----------------+------------------+ | Case-Sensitive | Case-Insensitive | +----------------+------------------+ | 0 | 1 | +----------------+------------------+
The match_type
argument can contain the following characters:
c
- Case sensitive matching.
i
- Case insensitive matching.
m
- Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
n
- The
.
character matches line terminators. The default is for.
matching to stop at the end of a line. u
- Unix-only line endings. Only the newline character is recognized as a line ending by the
.
,^
, and$
match operators.