In MariaDB, the REGEXP_SUBSTR()
function returns the substring that matches the given regular expression pattern.
If there’s no match (i.e. the input string doesn’t contain the substring), the result is an empty string.
Syntax
The syntax goes like this:
REGEXP_SUBSTR(subject,pattern)
Where subject
is the input string and pattern
is the regular expression pattern for the substring.
Note that, at the time of writing, MariaDB’s version of REGEXP_SUBSTR()
accepts fewer arguments than MySQL’s REGEXP_SUBSTR()
. MySQL’s version allows you to provide arguments for the starting position of the search, which occurrence to search for, as well as a way to refine the regular expression.
Example
Here’s a basic example:
SELECT REGEXP_SUBSTR('Thailand and Laos', 'l.nd');
Result:
+--------------------------------------------+ | REGEXP_SUBSTR('Thailand and Laos', 'l.nd') | +--------------------------------------------+ | land | +--------------------------------------------+
In this case there’s a match, and the substring is returned.
Multiple Matches
By default, if there are multiple matches within the string, the first one is returned:
SELECT REGEXP_SUBSTR('Lend for land', 'l.nd');
Result:
+----------------------------------------+ | REGEXP_SUBSTR('Lend for land', 'l.nd') | +----------------------------------------+ | Lend | +----------------------------------------+
No Match
Here’s an example where there’s no match:
SELECT REGEXP_SUBSTR('Lend for land', '^C');
Result:
+--------------------------------------+ | REGEXP_SUBSTR('Lend for land', '^C') | +--------------------------------------+ | | +--------------------------------------+
There’s no match, so the result is an empty string.
Case Sensitivity
The REGEXP_SUBSTR()
function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.
Here’s an example:
SELECT
REGEXP_SUBSTR('My Cats', 'c.t') AS "My Default",
REGEXP_SUBSTR(_latin7'My Cats' COLLATE latin7_general_ci, 'c.t') AS "Case Insensitive",
REGEXP_SUBSTR(_latin7'My Cats' COLLATE latin7_general_cs, 'c.t') AS "Case Sensitive";
Result:
+------------+------------------+----------------+ | My Default | Case Insensitive | Case Sensitive | +------------+------------------+----------------+ | Cat | Cat | | +------------+------------------+----------------+
My default collation is case insensitive. The other two strings were forced to a case insensitive collation and case sensitive collation respectively.
Providing a BINARY
string is also case sensitive (see below).
Binary Strings
Passing a BINARY
string also affects the case sensitivity. With BINARY
strings, an upper case character is different to its lower case counterpart:
Example:
SELECT
REGEXP_SUBSTR('My Cats', 'c.t') AS "Character",
REGEXP_SUBSTR(BINARY 'My Cats', 'c.t') AS "Binary";
Result:
+-----------+--------+ | Character | Binary | +-----------+--------+ | Cat | | +-----------+--------+
Here’s what happens if we change the case:
SELECT
REGEXP_SUBSTR('My Cats', 'C.t') AS "Character",
REGEXP_SUBSTR(BINARY 'My Cats', 'C.t') AS "Binary";
Result:
+-----------+--------+ | Character | Binary | +-----------+--------+ | Cat | Cat | +-----------+--------+
Null Arguments
Passing null
as any argument results in null
:
SELECT
REGEXP_SUBSTR(null, 'c.t') AS "1",
REGEXP_SUBSTR('Cat', null) AS "2",
REGEXP_SUBSTR(null, null) AS "3";
Result:
+------+------+------+ | 1 | 2 | 3 | +------+------+------+ | NULL | NULL | NULL | +------+------+------+
Wrong Number of Arguments
Passing the wrong number of arguments or no arguments, results in an error:
SELECT REGEXP_SUBSTR('Cat');
Result:
ERROR 1582 (42000): Incorrect parameter count in the call to native function 'REGEXP_SUBSTR'