In MySQL, the REGEXP_SUBSTR()
function returns the substring that matches the given regular expression pattern.
If there’s no match (i.e. the input string doesn’t contain the substring), the result is NULL
.
Syntax
The syntax goes like this:
REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])
Where expr
is the input string and pat
is the regular expression pattern for the substring.
The optional pos
argument allows you to specify a position within the string to start the search. If omitted, it starts at position 1.
The optional occurrence
argument allows you to specify which occurrence of the match to search for. If omitted, the first occurrence is used (occurrence 1).
The optional match_type
argument allows you to refine the regular expression. For example, you can use this argument to specify case-sensitive matching or not.
Example 1 – Basic Usage
Here’s a basic example:
SELECT REGEXP_SUBSTR('Thailand or Cambodia', 'l.nd') Result;
Result:
+--------+ | Result | +--------+ | land | +--------+
In this case there’s a match, and the substring is returned.
Example 2 – Multiple Matches
By default, if there are multiple matches within the string, the first one is returned (although you can specify another occurrence if needed. We’ll get to that soon):
SELECT REGEXP_SUBSTR('Lend for land', 'l.nd') Result;
Result:
+--------+ | Result | +--------+ | Lend | +--------+
Example 3 – No Match
Here’s an example where there’s no match:
SELECT REGEXP_SUBSTR('Lend for land', '^C') Result;
Result:
+--------+ | Result | +--------+ | NULL | +--------+
There’s no match, so the result is NULL
.
Example 4 – The pos
Argument
Here’s an example of specifying a starting position:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2) Result;
Result:
+--------+ | Result | +--------+ | Cut | +--------+
We started at position 2, which comes after the start of the first occurrence, so the next occurrence is returned instead.
Here’s another example:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1) 'Pos 1', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2) 'Pos 2', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+ | Pos 1 | Pos 2 | Pos 6 | +-------+-------+-------+ | Cat | Cut | Cot | +-------+-------+-------+
Example 5 – The occurrence
Argument
Here’s an example of using the occurrence
argument. In all cases, we start at position 1:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 1) 'Occurrence 1', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 2) 'Occurrence 2', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | Cat | Cut | Cot | +--------------+--------------+--------------+
However, if we start at a different position, the result is different:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 1) 'Occurrence 1', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 2) 'Occurrence 2', REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+ | Occurrence 1 | Occurrence 2 | Occurrence 3 | +--------------+--------------+--------------+ | Cut | Cot | NULL | +--------------+--------------+--------------+
This happened because our starting position came after the first occurrence had started. Therefore, occurrence 2 became occurrence 1, and occurrence 3 became occurrence 2. And because there were no more occurrences, the result of occurrence 3 was NULL (i.e. there was no occurrence 3).
Example 6 – The match_type
Argument
You can provide an additional argument to determine the match type. This allows you to specify things like whether or not the match is case-sensitive, whether or not to include line terminators, etc.
Here’s an example of specifying a case-sensitive match and a case-insensitive match:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'c.t', 1, 1, 'c') 'Case-Sensitive', REGEXP_SUBSTR('Cat Cut Cot', 'c.t', 1, 1, 'i') 'Case-Insensitive';
Result:
+----------------+------------------+ | Case-Sensitive | Case-Insensitive | +----------------+------------------+ | NULL | Cat | +----------------+------------------+
The match_type
argument can contain the following characters:
c
- Case sensitive matching.
i
- Case insensitive matching.
m
- Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
n
- The
.
character matches line terminators. The default is for.
matching to stop at the end of a line. u
- Unix-only line endings. Only the newline character is recognized as a line ending by the
.
,^
, and$
match operators.