In MariaDB, the REGEXP_INSTR()
function returns the starting index of a substring that matches the regular expression pattern.
The index starts at 1
. If there’s no match, the result is 0
.
Syntax
The syntax goes like this:
REGEXP_INSTR(subject, pattern)
Where subject
is the input string and pattern
is the regular expression pattern for the substring.
Note that, at the time of writing, MariaDB’s version of REGEXP_INSTR()
accepts fewer arguments than MySQL’s REGEXP_INSTR()
. MySQL’s version allows you to provide arguments for the starting position of the search, which occurrence to search for, which type of position to return, as well as a way to refine the regular expression.
Example
Here’s a basic example:
SELECT REGEXP_INSTR('Cat', 'at');
Result:
+---------------------------+ | REGEXP_INSTR('Cat', 'at') | +---------------------------+ | 2 | +---------------------------+
In this case there’s a match, and the substring starts at position 2.
No Match
Here’s an example where there’s no match:
SELECT REGEXP_INSTR('Cat', '^at');
Result:
+----------------------------+ | REGEXP_INSTR('Cat', '^at') | +----------------------------+ | 0 | +----------------------------+
There’s no match, so the result is 0
. There’s no match because I specified that the string must start with the substring.
Let’s change it so that it does start with that substring:
SELECT REGEXP_INSTR('at', '^at');
Result:
+---------------------------+ | REGEXP_INSTR('at', '^at') | +---------------------------+ | 1 | +---------------------------+
Case Sensitivity
The REGEXP_INSTR()
function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.
Here’s an example:
SELECT
REGEXP_INSTR('Cat', 'c') AS "My Default",
REGEXP_INSTR(_latin7'Cat' COLLATE latin7_general_ci, 'c') AS "Case Insensitive",
REGEXP_INSTR(_latin7'Cat' COLLATE latin7_general_cs, 'c') AS "Case Sensitive";
Result:
+------------+------------------+----------------+ | My Default | Case Insensitive | Case Sensitive | +------------+------------------+----------------+ | 1 | 1 | 0 | +------------+------------------+----------------+
My default collation is case insensitive. The other two strings were forced to a case insensitive collation and case sensitive collation respectively.
Providing a BINARY
string is also case sensitive (see below).
Binary Strings
By default, the positions are measured in characters rather than bytes. However, you can cast a multi-byte character set to BINARY
to get offsets in bytes if you need to.
Example:
SELECT
REGEXP_INSTR('© Cat', 'C') AS "Character",
REGEXP_INSTR(BINARY '© Cat', 'C') AS "Binary";
Result:
+-----------+--------+ | Character | Binary | +-----------+--------+ | 3 | 4 | +-----------+--------+
The copyright symbol uses two bytes, and therefore in this example we get a result of 4
when casting it to BINARY
, compared to 3
that we get otherwise.
Although note that passing a BINARY
string also affects the case sensitivity. With BINARY
strings, an upper case character is different to its lower case counterpart:
SELECT
REGEXP_INSTR('© Cat', 'c') AS "Character",
REGEXP_INSTR(BINARY '© Cat', 'c') AS "Binary";
Result:
+-----------+--------+ | Character | Binary | +-----------+--------+ | 3 | 0 | +-----------+--------+
Here, I searched for a lower case c
instead of an uppercase, and the BINARY
string didn’t match.
Null Arguments
Passing null
as any argument results in null
:
SELECT
REGEXP_INSTR(null, 'c') AS "1",
REGEXP_INSTR('Cat', null) AS "2",
REGEXP_INSTR(null, null) AS "3";
Result:
+------+------+------+ | 1 | 2 | 3 | +------+------+------+ | NULL | NULL | NULL | +------+------+------+
Wrong Number of Arguments
Passing the wrong number of arguments or no arguments, results in an error:
SELECT REGEXP_INSTR('Cat');
Result:
ERROR 1582 (42000): Incorrect parameter count in the call to native function 'REGEXP_INSTR'