How SOUNDEX() Works in MariaDB

In MariaDB, SOUNDEX() is a built-in string function that returns the Soundex string from a given string.

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. If two words sound the same, they should have the same Soundex string. If two words sound similar, but not exactly the same, their Soundex string might look similar but not exactly the same.

The function accepts one argument: the string for which to return the Soundex string from.

Syntax

The syntax goes like this:

SOUNDEX(str)

Where str is the string for which to return the Soundex string from.

Example

Here’s a basic example:

SELECT SOUNDEX('Bat');

Result:

+----------------+
| SOUNDEX('Bat') |
+----------------+
| B300           |
+----------------+

Here’s another example that compares the Soundex string returned from similar, but different, sounding words:

SELECT 
    SOUNDEX('Bat'),
    SOUNDEX('Cat'),
    SOUNDEX('Cap');

Result:

+----------------+----------------+----------------+
| SOUNDEX('Bat') | SOUNDEX('Cat') | SOUNDEX('Cap') |
+----------------+----------------+----------------+
| B300           | C300           | C100           |
+----------------+----------------+----------------+

And here’s one that compares words that don’t sound alike:

SELECT 
    SOUNDEX('Apartment'),
    SOUNDEX('Vehicle'),
    SOUNDEX('Groceries');

Result:

+----------------------+--------------------+----------------------+
| SOUNDEX('Apartment') | SOUNDEX('Vehicle') | SOUNDEX('Groceries') |
+----------------------+--------------------+----------------------+
| A16353               | V240               | G6262                |
+----------------------+--------------------+----------------------+

I should mention that this function implements the original Soundex algorithm which discards vowels first and duplicates second. This is in contrast to the enhanced version, which discards duplicates first and vowels second.

Also, a standard Soundex string is four characters long, but MariaDB’s SOUNDEX() function returns an arbitrarily long string. Therefore the above results include non-standard Soundex strings.

To illustrate what I mean, here’s the result that I get when using Oracle’s SOUNDEX() function to do the same thing:

SELECT 
    SOUNDEX('Apartment'),
    SOUNDEX('Vehicle'),
    SOUNDEX('Groceries')
FROM DUAL;

Result:

   SOUNDEX('APARTMENT')    SOUNDEX('VEHICLE')    SOUNDEX('GROCERIES') 
_______________________ _____________________ _______________________ 
A163                    V240                  G626                   

Exact Matches

Here’s an example of pairs of words that have a matching Soundex string, even though they’re different words, with different meanings:

SELECT 
    SOUNDEX('Dam') AS Dam, 
    SOUNDEX('Damn') AS Damn, 
    SOUNDEX('Too') AS Too, 
    SOUNDEX('Two') AS Two;

Result:

+------+------+------+------+
| Dam  | Damn | Too  | Two  |
+------+------+------+------+
| D500 | D500 | T000 | T000 |
+------+------+------+------+

Database Example

Here’s an example of getting the Soundex string from a database query:

SELECT 
    PetName, 
    SOUNDEX(PetName)
FROM Pets;

Result:

+---------+------------------+
| PetName | SOUNDEX(PetName) |
+---------+------------------+
| Fluffy  | F410             |
| Fetch   | F320             |
| Scratch | S632             |
| Wag     | W200             |
| Tweet   | T000             |
| Fluffy  | F410             |
| Bark    | B620             |
| Meow    | M000             |
+---------+------------------+

We can also use SOUNDEX() in a WHERE clause to return only those rows that sound like a given word:

SELECT 
    PetName, 
    SOUNDEX(PetName),
    SOUNDEX('Wagg')
FROM Pets
WHERE SOUNDEX(PetName) = SOUNDEX('Wagg');

Result:

+---------+------------------+-----------------+
| PetName | SOUNDEX(PetName) | SOUNDEX('Wagg') |
+---------+------------------+-----------------+
| Wag     | W200             | W200            |
+---------+------------------+-----------------+

However, you might prefer to use SOUNDS LIKE instead, which is a more concise way of doing the same thing.

Empty String

Here’s what happens when an empty string is passed for each given argument:

SELECT SOUNDEX('');

Result:

+-------------+
| SOUNDEX('') |
+-------------+
|             |
+-------------+

Null Arguments

Passing null returns null:

SELECT SOUNDEX(null);

Result:

+---------------+
| SOUNDEX(null) |
+---------------+
| NULL          |
+---------------+

Missing Argument

Calling SOUNDEX() with the wrong number of arguments, or without passing any arguments results in an error:

SELECT SOUNDEX();

Result:

ERROR 1582 (42000): Incorrect parameter count in the call to native function 'SOUNDEX'