How REGEXP_REPLACE() Works in MariaDB

In MariaDB, the REGEXP_REPLACE() function replaces occurrences of the substring within a string that matches the given regular expression pattern.

The whole string is returned along with the replacements.

If there’s no match (i.e. the input string doesn’t contain the substring), the the whole string is returned unchanged.

Syntax

The syntax goes like this:

REGEXP_REPLACE(subject, pattern, replace)

Where subject is the input string, pattern is the regular expression pattern for the substring, and replace is the replacement string.

Note that, at the time of writing, MariaDB’s version of REGEXP_REPLACE() accepts fewer arguments than MySQL’s REGEXP_REPLACE() function. MySQL’s version allows you to provide arguments for the starting position of the search, which occurrence to search for, as well as a way to refine the regular expression with match type.

Example

Here’s a basic example of using REGEXP_REPLACE() in MariaDB:

SELECT REGEXP_REPLACE('Cats and dogs', 'd.g', 'bird');

Result:

+------------------------------------------------+
| REGEXP_REPLACE('Cats and dogs', 'd.g', 'bird') |
+------------------------------------------------+
| Cats and birds                                 |
+------------------------------------------------+

In this case there’s a match, and the substring is replaced with the replacement string.

Regular expressions can be very powerful, and this example uses a very simple example. In order to use REGEXP_REPLACE() effectively, you’ll need to know the correct pattern to use for the desired outcome.

Multiple Matches

Here’s an example with multiple matches:

SELECT REGEXP_REPLACE('My dog likes other dogs', 'd.g', 'bird');

Result:

+----------------------------------------------------------+
| REGEXP_REPLACE('My dog likes other dogs', 'd.g', 'bird') |
+----------------------------------------------------------+
| My bird likes other birds                                |
+----------------------------------------------------------+

No Match

Here’s an example where there’s no match:

SELECT REGEXP_REPLACE('Cats and dogs', 't.g', 'bird');

Result:

+------------------------------------------------+
| REGEXP_REPLACE('Cats and dogs', 't.g', 'bird') |
+------------------------------------------------+
| Cats and dogs                                  |
+------------------------------------------------+

There’s no match, so the original string is returned unchanged.

Case Sensitivity

The REGEXP_REPLACE() function follows the case sensitivity rules of the effective collation. Matching is performed case insensitively for case insensitive collations, and case sensitively for case sensitive collations and for binary data.

Here’s an example:

SELECT 
    REGEXP_REPLACE('My Cats', 'c.t', 'dog') AS "My Default",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_ci, 'c.t', 'dog') AS "Case Insensitive",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_cs, 'c.t', 'dog') AS "Case Sensitive";

Result:

+------------+------------------+----------------+
| My Default | Case Insensitive | Case Sensitive |
+------------+------------------+----------------+
| My dogs    | My dogs          | My Cats        |
+------------+------------------+----------------+

My default collation is case insensitive. The other two strings were forced to a case insensitive collation and case sensitive collation respectively.

Providing a BINARY string is also case sensitive (more on this later).

Override Case Sensitivity

The collation case sensitivity can be overridden using the (?i) and (?-i) PCRE flags.

Here’s the previous example, but this time using the (?-i) flag on each pattern to force case sensitivity:

SELECT 
    REGEXP_REPLACE('My Cats', '(?-i)c.t', 'dog') AS "My Default",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_ci, '(?-i)c.t', 'dog') AS "Case Insensitive",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_cs, '(?-i)c.t', 'dog') AS "Case Sensitive";

Result:

+------------+------------------+----------------+
| My Default | Case Insensitive | Case Sensitive |
+------------+------------------+----------------+
| My Cats    | My Cats          | My Cats        |
+------------+------------------+----------------+

And here it is using the (?i) flag to force case insensitivity:

SELECT 
    REGEXP_REPLACE('My Cats', '(?i)c.t', 'dog') AS "My Default",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_ci, '(?i)c.t', 'dog') AS "Case Insensitive",
    REGEXP_REPLACE(_latin7'My Cats' COLLATE latin7_general_cs, '(?i)c.t', 'dog') AS "Case Sensitive";

Result:

+------------+------------------+----------------+
| My Default | Case Insensitive | Case Sensitive |
+------------+------------------+----------------+
| My dogs    | My dogs          | My dogs        |
+------------+------------------+----------------+

Binary Strings

Passing a BINARY string also affects the case sensitivity. With BINARY strings, an upper case character is different to its lower case counterpart:

Example:

SELECT 
    REGEXP_REPLACE('My Cats', 'c.t', 'dog') AS "Character",
    REGEXP_REPLACE(BINARY 'My Cats', 'c.t', 'dog') AS "Binary";

Result:

+-----------+---------+
| Character | Binary  |
+-----------+---------+
| My dogs   | My Cats |
+-----------+---------+

Here’s what happens if we change the case:

SELECT 
    REGEXP_REPLACE('My Cats', 'C.t', 'dog') AS "Character",
    REGEXP_REPLACE(BINARY 'My Cats', 'C.t', 'dog') AS "Binary";

Result:

+-----------+---------+
| Character | Binary  |
+-----------+---------+
| My dogs   | My dogs |
+-----------+---------+

Case sensitivity on BINARY strings can also be overridden with the (?-i) and (?i) PCRE flags:

SELECT 
    REGEXP_REPLACE('My Cats', '(?-i)c.t', 'dog') AS "Character",
    REGEXP_REPLACE(BINARY 'My Cats', '(?-i)c.t', 'dog') AS "Binary";

Result:

+-----------+---------+
| Character | Binary  |
+-----------+---------+
| My Cats   | My Cats |
+-----------+---------+

Null Arguments

Passing null as any argument results in null:

SELECT 
    REGEXP_REPLACE(null, 'c.t', 'dog') AS "1",
    REGEXP_REPLACE('Cat', null, 'dog') AS "2",
    REGEXP_REPLACE('Cat', 'c.t', null) AS "3",
    REGEXP_REPLACE(null, null, null) AS "4";

Result:

+------+------+------+------+
| 1    | 2    | 3    | 4    |
+------+------+------+------+
| NULL | NULL | NULL | NULL |
+------+------+------+------+

Wrong Number of Arguments

Passing the wrong number of arguments or no arguments, results in an error:

SELECT REGEXP_REPLACE();

Result:

ERROR 1582 (42000): Incorrect parameter count in the call to native function 'REGEXP_REPLACE'