Some Handy T-SQL String Manipulation Techniques

Transact-SQL (T-SQL) offers a nice set of functions and methods for manipulating strings. Let’s look at some handy techniques that can help us work more efficiently with string data in SQL Server.

String Splitting with STRING_SPLIT()

The STRING_SPLIT() function, introduced in SQL Server 2016, provides an efficient way to split strings based on a delimiter:

SELECT value
FROM STRING_SPLIT('apple,banana,cherry', ',');

Result:

value 
------
apple
banana
cherry

3 row(s) returned

This function returns a table of strings, making it easy to use in JOINs or other set-based operations.

String Aggregation with STRING_AGG()

The STRING_AGG() function, available from SQL Server 2017 onwards, allows for efficient string concatenation:

SELECT STRING_AGG(Name, ', ') AS ProductList FROM Product;

Example result:

ProductList                                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------
Mountain Bike, Road Bike, Helmet, Water Bottle, Bike Pump, Cycling Shorts, Bike Light Set, Cycling Gloves, Bike Lock, Cycling Jersey

1 row(s) returned

This function can also be used with ORDER BY for sorted concatenation. To do this, use the WITHIN GROUP clause:

SELECT 
    STRING_AGG(Name, ', ') WITHIN GROUP ( ORDER BY Name ) AS ProductList 
FROM Product;

Example result:

ProductList                                                                                                                         
------------------------------------------------------------------------------------------------------------------------------------
Bike Light Set, Bike Lock, Bike Pump, Cycling Gloves, Cycling Jersey, Cycling Shorts, Helmet, Mountain Bike, Road Bike, Water Bottle

1 row(s) returned

Pattern Matching with LIKE

Complex pattern matching can be achieved using the LIKE operator with wildcards:

SELECT *
FROM Customers
WHERE LastName LIKE '[AEIOU]%';

This returns customers where LastName starts with a vowel.

Here’s another example:

SELECT ProductName, Description
FROM Products
WHERE Description LIKE '%organic%'
   OR Description LIKE '%natural%'
   OR Description LIKE '%eco-friendly%';

This query retrieves products where the description contains any of the words organic, natural, or eco-friendly. The % wildcard allows for any characters before or after the specified words.

One more example.

Suppose we have the following table:

EmployeeID  Email                           
---------- --------------------------------
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
7 [email protected]
8 [email protected]
9 [email protected]

Here’s a query that uses the LIKE operator to filter the Email column by a certain pattern:

SELECT EmployeeID, Email
FROM Employees
WHERE Email LIKE '%[a-z]%[.][a-z]%@%[.]%[.]%';

Example result:

EmployeeID  Email                       
---------- ----------------------------
2 [email protected]
4 [email protected]
7 [email protected]
8 [email protected]

Substring Extraction with CHARINDEX() and SUBSTRING()

Combine CHARINDEX() and SUBSTRING() for flexible text extraction:

DECLARE @email VARCHAR(100) = '[email protected]';
SELECT SUBSTRING(@email, 1, CHARINDEX('@', @email) - 1);

Here it is when run against the Employees table from the previous example:

SELECT 
    SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS UserName 
FROM Employees;

Result:

UserName    
------------
john_doe
jane.smith
mike.j
sarah.w
robert_brown
emily.davis
david.m
lisa.wilson
tom.anderson

Advanced Parsing with PARSENAME()

PARSENAME() is traditionally used for parsing object names, but it can be repurposed for general string parsing:

SELECT PARSENAME('192.168.0.1', 4);

Result:

192

This technique works well for strings with up to four period-separated parts. PARSENAME() works backwards, and so that’s why 4 returned the first octet.

If the separators are something other than dots, we can use REPLACE() to change them to the relevant separator:

SELECT PARSENAME(REPLACE('HWP-435-9002-98', '-', '.'), 2);

Result:

9002

If you don’t like the fact that PARSENAME() works backwards, there’s always the option of using the REVERSE() function in a couple of places to change the order around:

SELECT REVERSE(PARSENAME(REPLACE(REVERSE('HWP-435-9002-98'), '-', '.'), 2));

Result:

435

Here, the inner REVERSE() would’ve resulted in 435 being returned as 534 (reverse order), and so the outer REVERSE() reverses that again to its original order.

Bearing in mind that the PARSENAME() wasn’t really designed for this type of string splitting (it is mainly for returning a specified part of an object name), there are usually other ways to do this type of thing (and let’s not forget our first example that uses the STRING_SPLIT() function). Here’s an interesting discussion over at SQLTeam.com about the pros and cons of using PARSENAME() for string splitting along with some alternatives.

Fuzzy Matching with SOUNDEX()

The SOUNDEX() function can be used for phonetic matching:

SELECT * 
FROM Employees 
WHERE SOUNDEX(LastName) = SOUNDEX('Smith');

Result:

EmployeeID  FirstName  LastName  Email                      Department
---------- --------- -------- ------------------------- ----------
2 Jane Smith [email protected] Marketing
10 Sasha Smyth [email protected] Sales
11 Rod Smythe [email protected] Sales

We can see that although we only entered Smith, the SOUNDEX() function ensured that it also matched other names that sound similar to Smith.

Conclusion

These handy T-SQL string manipulation techniques can significantly enhance our ability to work with text data in SQL Server.

Remember to consider performance implications when working with large datasets, and always test thoroughly in a non-production environment before implementing these techniques in production systems.