How REGEXP_SUBSTR() Works in MariaDB

In MariaDB, the REGEXP_SUBSTR() function returns the substring that matches the given regular expression pattern.

If there’s no match (i.e. the input string doesn’t contain the substring), the result is an empty string.

Syntax

The syntax goes like this:

REGEXP_SUBSTR(subject,pattern)

Where subject is the input string and pattern is the regular expression pattern for the substring.

Note that, at the time of writing, MariaDB’s version of REGEXP_SUBSTR() accepts fewer arguments than MySQL’s REGEXP_SUBSTR(). MySQL’s version allows you to provide arguments for the starting position of the search, which occurrence to search for, as well as a way to refine the regular expression.

Example

Here’s a basic example:

SELECT REGEXP_SUBSTR('Thailand and Laos', 'l.nd');

Result:

+--------------------------------------------+
| REGEXP_SUBSTR('Thailand and Laos', 'l.nd') |
+--------------------------------------------+
| land                                       |
+--------------------------------------------+

In this case there’s a match, and the substring is returned.

Multiple Matches

By default, if there are multiple matches within the string, the first one is returned:

SELECT REGEXP_SUBSTR('Lend for land', 'l.nd');

Result:

+----------------------------------------+
| REGEXP_SUBSTR('Lend for land', 'l.nd') |
+----------------------------------------+
| Lend                                   |
+----------------------------------------+

No Match

Here’s an example where there’s no match:

SELECT REGEXP_SUBSTR('Lend for land', '^C');

Result:

+--------------------------------------+
| REGEXP_SUBSTR('Lend for land', '^C') |
+--------------------------------------+
|                                      |
+--------------------------------------+

There’s no match, so the result is an empty string.

Case Sensitivity

The REGEXP_SUBSTR() function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

Here’s an example:

SELECT 
    REGEXP_SUBSTR('My Cats', 'c.t') AS "My Default",
    REGEXP_SUBSTR(_latin7'My Cats' COLLATE latin7_general_ci, 'c.t') AS "Case Insensitive",
    REGEXP_SUBSTR(_latin7'My Cats' COLLATE latin7_general_cs, 'c.t') AS "Case Sensitive";

Result:

+------------+------------------+----------------+
| My Default | Case Insensitive | Case Sensitive |
+------------+------------------+----------------+
| Cat        | Cat              |                |
+------------+------------------+----------------+

My default collation is case insensitive. The other two strings were forced to a case insensitive collation and case sensitive collation respectively.

Providing a BINARY string is also case sensitive (see below).

Binary Strings

Passing a BINARY string also affects the case sensitivity. With BINARY strings, an upper case character is different to its lower case counterpart:

Example:

SELECT 
    REGEXP_SUBSTR('My Cats', 'c.t') AS "Character",
    REGEXP_SUBSTR(BINARY 'My Cats', 'c.t') AS "Binary";

Result:

+-----------+--------+
| Character | Binary |
+-----------+--------+
| Cat       |        |
+-----------+--------+

Here’s what happens if we change the case:

SELECT 
    REGEXP_SUBSTR('My Cats', 'C.t') AS "Character",
    REGEXP_SUBSTR(BINARY 'My Cats', 'C.t') AS "Binary";

Result:

+-----------+--------+
| Character | Binary |
+-----------+--------+
| Cat       | Cat    |
+-----------+--------+

Null Arguments

Passing null as any argument results in null:

SELECT 
    REGEXP_SUBSTR(null, 'c.t') AS "1",
    REGEXP_SUBSTR('Cat', null) AS "2",
    REGEXP_SUBSTR(null, null) AS "3";

Result:

+------+------+------+
| 1    | 2    | 3    |
+------+------+------+
| NULL | NULL | NULL |
+------+------+------+

Wrong Number of Arguments

Passing the wrong number of arguments or no arguments, results in an error:

SELECT REGEXP_SUBSTR('Cat');

Result:

ERROR 1582 (42000): Incorrect parameter count in the call to native function 'REGEXP_SUBSTR'