How the SQL Server SOUNDEX() Function Works

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It was developed and patented in 1918 and 1922.

One of the functions available in SQL Server is the SOUNDEX() function, which returns the Soundex code for a given string.

Syntax

The syntax goes like this:

SOUNDEX ( character_expression )

Where character_expression is the word or string that you want the Soundex code for. This can be a constant, variable, or column.

The SOUNDEX() function is collation sensitive, and string functions can be nested.

The Soundex Code

As mentioned, the SOUNDEX() function returns the Soundex code for the given string. The Soundex code is a four-character code that is based on how the string sounds when spoken. Here’s an example of a Soundex code:

S600

Here’s how a Soundex code is constructed:

  • The first character of the code is the first character of the string, converted to upper case. So in the above example, we know that the string starts with the letter S (either lowercase or uppercase).
  • The second through fourth characters of the code are numbers that represent the letters in the expression.
  • The letters A, E, I, O, U, H, W, and Y are ignored unless they are the first letter of the string.
  • Zeroes are added at the end if necessary to produce a four-character code.

Example

Here’s an example of retrieving the Soundex string from a string:

SELECT SOUNDEX('Sure');

Result:

S600

So we can see that the word Sure has a Soundex code of S600.

Example of Two Matching Words

Here’s an example of where two words share the same Soundex code (because they sound the same):

SELECT 
    SOUNDEX('Sure') AS Sure, 
    SOUNDEX('Shore') AS Shore;

Result:

Sure  Shore
----  -----
S600  S600

Here are some more exact match examples:

SELECT 
    SOUNDEX('Dam') AS Dam,
    SOUNDEX('Damn') AS Damn,
    SOUNDEX('Too') AS Too,
    SOUNDEX('Two') AS Two;

Result:

Dam   Damn  Too   Two 
----  ----  ----  ----
D500  D500  T000  T000

Example of Words that Don’t Match

Here’s an example of where two words don’t sound the same, and therefore, they have different Soundex codes:

SELECT 
    SOUNDEX('Water') AS Water, 
    SOUNDEX('Coffee') AS Coffee;

Result:

Water  Coffee
-----  ------
W360   C100

Words with Different Spellings

Some words have different spellings depending on which country you’re from. Such words will share the same Soundex code:

SELECT 
  SOUNDEX('Flavor') AS 'Flavor',
  SOUNDEX('Flavour') AS 'Flavour';

Result:

Flavor  Flavour
------  -------
F416    F416

Same Sound, Different Soundex Code

Sometimes, two words sound the same, but they have different Soundex codes. The most common reason for this is that they start with a different letter (one uses a silent letter). As mentioned, the Soundex code starts with the first letter of the string (converted to uppercase). Therefore, if you have two words that are pronounced exactly the same, but they start with a different letter, they’ll have a different Soundex code.

Here are some examples:

SELECT
    SOUNDEX('Hole') AS 'Hole',
    SOUNDEX('Whole') AS 'Whole',
    SOUNDEX('Our') AS Our,
    SOUNDEX('Hour') AS Hour;

Result:

Hole  Whole  Our   Hour
----  -----  ----  ----
H400  W400   O600  H600

The pairs in this example have different Soundex codes solely because their first letter is different.