How REGEXP Works in MariaDB

In MariaDB, the REGEXP operator is used to determine whether or not a string matches a regular expression.

If the string matches the regular expression provided, the result is 1, otherwise it’s 0.

Syntax

The syntax goes like this:

expr REGEXP pat

Where expr is the input string and pat is the regular expression for which you’re testing the string against.

Examples

Below are examples of using the REGEXP operator with different patterns.

Basic Regular Expression

Let’s start with the simplest example. The simplest regular expression we can use is one that has no special characters in it. Here, we just use a string. If any part of the input string matches that string, it returns a match.

SELECT 
  'Corn' REGEXP 'Corn' AS "Corn",
  'Acorn' REGEXP 'Corn' AS "Acorn",
  'Corner' REGEXP 'Corn' AS "Corner",
  'Cheese' REGEXP 'Corn' AS "Cheese";

Result:

+------+-------+--------+--------+
| Corn | Acorn | Corner | Cheese |
+------+-------+--------+--------+
|    1 |     1 |      1 |      0 |
+------+-------+--------+--------+

Match the Beginning of a String

In this example, the regular expression specifies that the string must begin with Co.

SELECT 
  'Corn' REGEXP '^Co' AS "Corn",
  'Acorn' REGEXP '^Co' AS "Acorn",
  'Cheese' REGEXP '^Co' AS "Cheese";

Result:

+------+-------+--------+
| Corn | Acorn | Cheese |
+------+-------+--------+
|    1 |     0 |      0 |
+------+-------+--------+

Match the End of a String

In this example, the regular expression specifies that the string must end with rn.

SELECT 
  'Corn' REGEXP 'rn$' AS "Corn",
  'Acorn' REGEXP 'rn$' AS "Acorn",
  'Cheese' REGEXP 'rn$' AS "Cheese";

Result:

+------+-------+--------+
| Corn | Acorn | Cheese |
+------+-------+--------+
|    1 |     1 |      0 |
+------+-------+--------+

Match Any Character

The . character matches any character.

SELECT 
  'Corn' REGEXP '.' AS "Corn",
  'Cheese' REGEXP '.' AS "Cheese",
  '' REGEXP '.' AS "";

Result:

+------+--------+---+
| Corn | Cheese |   |
+------+--------+---+
|    1 |      1 | 0 |
+------+--------+---+

This character is typically used in conjunction with other characters to specify further criteria. For example:

SELECT 
  'Corn' REGEXP '^C.rn$' AS "Corn", 
  'Crn' REGEXP '^C.rn$' AS "Crn";

Result:

+------+-----+
| Corn | Crn |
+------+-----+
|    1 |   0 |
+------+-----+

Here we specify that the string must start with C, that it must be followed by a character (any character), and that it must end with rn.

Note that this character specifies a single instance of the character. If you want to specify multiple instances (for example ee instead of just e), you’ll need to add more . characters.

SELECT 
  'Tweet' REGEXP '^Tw..t$' AS "Tweet", 
  'Tweat' REGEXP '^Tw..t$' AS "Tweat", 
  'Tweet' REGEXP '^Tw.t$' AS "Tweet", 
  'Twit' REGEXP '^Tw..t$' AS "Twit";

Result:

+-------+-------+-------+------+
| Tweet | Tweat | Tweet | Twit |
+-------+-------+-------+------+
|     1 |     1 |     0 |    0 |
+-------+-------+-------+------+

Another way to do this is to specify the number of occurrences within curly brackets:

SELECT 
  'Tweet' REGEXP '^Tw.{2}t$' AS "Tweet", 
  'Tweat' REGEXP '^Tw.{2}t$' AS "Tweat", 
  'Tweet' REGEXP '^Tw.{1}t$' AS "Tweet", 
  'Twit' REGEXP '^Tw.{2}t$' AS "Twit";

Result:

+-------+-------+-------+------+
| Tweet | Tweat | Tweet | Twit |
+-------+-------+-------+------+
|     1 |     1 |     0 |    0 |
+-------+-------+-------+------+

However, if you know the character that you’re looking for, you can specify that character (instead of the . character), as demonstrated in the following example.

Match Zero or More Instances of a Specific Character

We can do the following to specify zero or more instances of the e character:

SELECT 
  'Twet' REGEXP '^Twe*t$' AS "Twet",
  'Tweet' REGEXP '^Twe*t$' AS "Tweet",
  'Tweeet' REGEXP '^Twe*t$' AS "Tweeet",
  'Twt' REGEXP '^Twe*t$' AS "Twt",
  'Twit' REGEXP '^Twe*t$' AS "Twit",
  'Twiet' REGEXP '^Twe*t$' AS "Twiet",
  'Tweit' REGEXP '^Twe*t$' AS "Tweit";

Result:

+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     1 |      1 |   1 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+

The first four match but the last three don’t.

Match One or More Instances of a Specific Character

We can modify the previous example so that we only get a match if one or more characters is found (the previous example returned a match if zero or more were found). To do this, we simply use + instead of *:

SELECT 
  'Twet' REGEXP '^Twe+t$' AS "Twet",
  'Tweet' REGEXP '^Twe+t$' AS "Tweet",
  'Tweeet' REGEXP '^Twe+t$' AS "Tweeet",
  'Twt' REGEXP '^Twe+t$' AS "Twt",
  'Twit' REGEXP '^Twe+t$' AS "Twit",
  'Twiet' REGEXP '^Twe+t$' AS "Twiet",
  'Tweit' REGEXP '^Twe+t$' AS "Tweit";

Result:

+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     1 |      1 |   0 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+

In this case, the fourth word returns a different result to the previous example.

Match Zero or One Instance of a Specific Character

We can modify the previous example so that we only get a match on zero or one of the desired characters. To do this, we use ?:

SELECT 
  'Twet' REGEXP '^Twe?t$' AS "Twet",
  'Tweet' REGEXP '^Twe?t$' AS "Tweet",
  'Tweeet' REGEXP '^Twe?t$' AS "Tweeet",
  'Twt' REGEXP '^Twe?t$' AS "Twt",
  'Twit' REGEXP '^Twe?t$' AS "Twit",
  'Twiet' REGEXP '^Twe?t$' AS "Twiet",
  'Tweit' REGEXP '^Twe?t$' AS "Tweit";

Result:

+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     0 |      0 |   1 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+

Alternation

You can use the | character to match one or another sequence of characters:

SELECT 
  'Tweet' REGEXP 'Tw|et' AS "Tweet",
  'For Let' REGEXP 'Tw|et' AS "For Let",
  'Banana' REGEXP 'Tw|et' AS "Banana";

Result:

+-------+---------+--------+
| Tweet | For Let | Banana |
+-------+---------+--------+
|     1 |       1 |      0 |
+-------+---------+--------+

Here’s another example where I search for whole words:

SELECT 
  'Cat' REGEXP 'Cat|Dog' AS "Cat",
  'Dog' REGEXP 'Cat|Dog' AS "Dog",
  'Doggone' REGEXP 'Cat|Dog' AS "Doggone",
  'Banana' REGEXP 'Cat|Dog' AS "Banana";

Result:

+-----+-----+---------+--------+
| Cat | Dog | Doggone | Banana |
+-----+-----+---------+--------+
|   1 |   1 |       1 |      0 |
+-----+-----+---------+--------+

We still get a match even when our regular expression matches only part of the the string.

Match Zero or More Instances of a Sequence

You can use brackets along with the asterisk ()* to specify zero or more instances of a sequence:

SELECT 
  'Banana' REGEXP '(an)*' AS "Banana",
  'Land' REGEXP '(an)*' AS "Land",
  'Cheese' REGEXP '(an)*' AS "Cheese";

Result:

+--------+------+--------+
| Banana | Land | Cheese |
+--------+------+--------+
|      1 |    1 |      1 |
+--------+------+--------+

Another example:

SELECT 
  'Banana' REGEXP '^B(an)*d$' AS "Banana",
  'Band' REGEXP '^B(an)*d$' AS "Band",
  'Bald' REGEXP '^B(an)*d$' AS "Bald",
  'Bad' REGEXP '^B(an)*d$' AS "Bad";

Result:

+--------+------+------+-----+
| Banana | Band | Bald | Bad |
+--------+------+------+-----+
|      0 |    1 |    0 |   0 |
+--------+------+------+-----+

Repetition

As seen in a previous example, you can use curly brackets to specify repetition. This notation provides a more general way of writing regular expressions than some of the previous examples:

SELECT 
  'Tweeet' REGEXP 'e{3}' AS "Tweeet",
  'Tweet' REGEXP 'e{3}' AS "Tweet";

Result:

+--------+-------+
| Tweeet | Tweet |
+--------+-------+
|      1 |     0 |
+--------+-------+

Range

You can use the hyphen character to specify a range. Here’s an example that specifies a range of numbers:

SELECT 
  'Tweet 123' REGEXP '[0-9]' AS "Tweet 123",
  'Tweet ABC' REGEXP '[0-9]' AS "Tweet ABC";

Result:

+--------+-------+
| Tweeet | Tweet |
+--------+-------+
|      1 |     0 |
+--------+-------+

And the following example specifies a range of letters:

SELECT 
  'Tweet 123' REGEXP '[A-Z]' AS "Tweet 123",
  'ABC' REGEXP '[A-Z]' AS "ABC",
  '123' REGEXP '[A-Z]' AS "123";

Result:

+-----------+-----+-----+
| Tweet 123 | ABC | 123 |
+-----------+-----+-----+
|         1 |   1 |   0 |
+-----------+-----+-----+

Here’s what happens if we limit the range of numbers:

SELECT 
  '123' REGEXP '[1-3]' AS "123",
  '012' REGEXP '[1-3]' AS "012",
  '045' REGEXP '[1-3]' AS "045";

Result:

+-----+-----+-----+
| 123 | 012 | 045 |
+-----+-----+-----+
|   1 |   1 |   0 |
+-----+-----+-----+

Not Within a Range

We can use the ^ character to modify the previous example so that the specified range of characters are excluded:

SELECT 
  '123' REGEXP '[^1-3]' AS "123",
  '012' REGEXP '[^1-3]' AS "012",
  '045' REGEXP '[^1-3]' AS "045";

Result:

+-----+-----+-----+
| 123 | 012 | 045 |
+-----+-----+-----+
|   0 |   1 |   1 |
+-----+-----+-----+

In this case we get the opposite result to the previous example.

Null Values

If either the expression or the pattern is null, the result is null:

SELECT 
  null REGEXP 'Corn' AS "Corn",
  'Acorn' REGEXP null AS "Acorn",
  null REGEXP null AS "Corner";

Result:

+------+-------+--------+
| Corn | Acorn | Corner |
+------+-------+--------+
| NULL |  NULL |   NULL |
+------+-------+--------+