How REGEXP_INSTR() Works in MariaDB

In MariaDB, the REGEXP_INSTR() function returns the starting index of a substring that matches the regular expression pattern.

The index starts at 1. If there’s no match, the result is 0.

Syntax

The syntax goes like this:

REGEXP_INSTR(subject, pattern)

Where subject is the input string and pattern is the regular expression pattern for the substring.

Note that, at the time of writing, MariaDB’s version of REGEXP_INSTR() accepts fewer arguments than MySQL’s REGEXP_INSTR(). MySQL’s version allows you to provide arguments for the starting position of the search, which occurrence to search for, which type of position to return, as well as a way to refine the regular expression.

Example

Here’s a basic example:

SELECT REGEXP_INSTR('Cat', 'at');

Result:

+---------------------------+
| REGEXP_INSTR('Cat', 'at') |
+---------------------------+
|                         2 |
+---------------------------+

In this case there’s a match, and the substring starts at position 2.

No Match

Here’s an example where there’s no match:

SELECT REGEXP_INSTR('Cat', '^at');

Result:

+----------------------------+
| REGEXP_INSTR('Cat', '^at') |
+----------------------------+
|                          0 |
+----------------------------+

There’s no match, so the result is 0. There’s no match because I specified that the string must start with the substring.

Let’s change it so that it does start with that substring:

SELECT REGEXP_INSTR('at', '^at');

Result:

+---------------------------+
| REGEXP_INSTR('at', '^at') |
+---------------------------+
|                         1 |
+---------------------------+

Case Sensitivity

The REGEXP_INSTR() function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

Here’s an example:

SELECT 
    REGEXP_INSTR('Cat', 'c') AS "My Default",
    REGEXP_INSTR(_latin7'Cat' COLLATE latin7_general_ci, 'c') AS "Case Insensitive",
    REGEXP_INSTR(_latin7'Cat' COLLATE latin7_general_cs, 'c') AS "Case Sensitive";

Result:

+------------+------------------+----------------+
| My Default | Case Insensitive | Case Sensitive |
+------------+------------------+----------------+
|          1 |                1 |              0 |
+------------+------------------+----------------+

My default collation is case insensitive. The other two strings were forced to a case insensitive collation and case sensitive collation respectively.

Providing a BINARY string is also case sensitive (see below).

Binary Strings

By default, the positions are measured in characters rather than bytes. However, you can cast a multi-byte character set to BINARY to get offsets in bytes if you need to.

Example:

SELECT 
    REGEXP_INSTR('© Cat', 'C') AS "Character",
    REGEXP_INSTR(BINARY '© Cat', 'C') AS "Binary";

Result:

+-----------+--------+
| Character | Binary |
+-----------+--------+
|         3 |      4 |
+-----------+--------+

The copyright symbol uses two bytes, and therefore in this example we get a result of 4 when casting it to BINARY, compared to 3 that we get otherwise.

Although note that passing a BINARY string also affects the case sensitivity. With BINARY strings, an upper case character is different to its lower case counterpart:

SELECT 
    REGEXP_INSTR('© Cat', 'c') AS "Character",
    REGEXP_INSTR(BINARY '© Cat', 'c') AS "Binary";

Result:

+-----------+--------+
| Character | Binary |
+-----------+--------+
|         3 |      0 |
+-----------+--------+

Here, I searched for a lower case c instead of an uppercase, and the BINARY string didn’t match.

Null Arguments

Passing null as any argument results in null:

SELECT 
    REGEXP_INSTR(null, 'c') AS "1",
    REGEXP_INSTR('Cat', null) AS "2",
    REGEXP_INSTR(null, null) AS "3";

Result:

+------+------+------+
| 1    | 2    | 3    |
+------+------+------+
| NULL | NULL | NULL |
+------+------+------+

Wrong Number of Arguments

Passing the wrong number of arguments or no arguments, results in an error:

SELECT REGEXP_INSTR('Cat');

Result:

ERROR 1582 (42000): Incorrect parameter count in the call to native function 'REGEXP_INSTR'