Transact-SQL (T-SQL) offers a nice set of functions and methods for manipulating strings. Let’s look at some handy techniques that can help us work more efficiently with string data in SQL Server.
String Splitting with STRING_SPLIT()
The STRING_SPLIT()
function, introduced in SQL Server 2016, provides an efficient way to split strings based on a delimiter:
SELECT value
FROM STRING_SPLIT('apple,banana,cherry', ',');
Result:
value
------
apple
banana
cherry
3 row(s) returned
This function returns a table of strings, making it easy to use in JOIN
s or other set-based operations.
String Aggregation with STRING_AGG()
The STRING_AGG()
function, available from SQL Server 2017 onwards, allows for efficient string concatenation:
SELECT STRING_AGG(Name, ', ') AS ProductList FROM Product;
Example result:
ProductList
------------------------------------------------------------------------------------------------------------------------------------
Mountain Bike, Road Bike, Helmet, Water Bottle, Bike Pump, Cycling Shorts, Bike Light Set, Cycling Gloves, Bike Lock, Cycling Jersey
1 row(s) returned
This function can also be used with ORDER BY
for sorted concatenation. To do this, use the WITHIN GROUP
clause:
SELECT
STRING_AGG(Name, ', ') WITHIN GROUP ( ORDER BY Name ) AS ProductList
FROM Product;
Example result:
ProductList
------------------------------------------------------------------------------------------------------------------------------------
Bike Light Set, Bike Lock, Bike Pump, Cycling Gloves, Cycling Jersey, Cycling Shorts, Helmet, Mountain Bike, Road Bike, Water Bottle
1 row(s) returned
Pattern Matching with LIKE
Complex pattern matching can be achieved using the LIKE
operator with wildcards:
SELECT *
FROM Customers
WHERE LastName LIKE '[AEIOU]%';
This returns customers where LastName
starts with a vowel.
Here’s another example:
SELECT ProductName, Description
FROM Products
WHERE Description LIKE '%organic%'
OR Description LIKE '%natural%'
OR Description LIKE '%eco-friendly%';
This query retrieves products where the description contains any of the words organic
, natural
, or eco-friendly
. The %
wildcard allows for any characters before or after the specified words.
One more example.
Suppose we have the following table:
EmployeeID Email
---------- --------------------------------
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
7 [email protected]
8 [email protected]
9 [email protected]
Here’s a query that uses the LIKE
operator to filter the Email
column by a certain pattern:
SELECT EmployeeID, Email
FROM Employees
WHERE Email LIKE '%[a-z]%[.][a-z]%@%[.]%[.]%';
Example result:
EmployeeID Email
---------- ----------------------------
2 [email protected]
4 [email protected]
7 [email protected]
8 [email protected]
Substring Extraction with CHARINDEX()
and SUBSTRING()
Combine CHARINDEX()
and SUBSTRING()
for flexible text extraction:
DECLARE @email VARCHAR(100) = '[email protected]';
SELECT SUBSTRING(@email, 1, CHARINDEX('@', @email) - 1);
Here it is when run against the Employees
table from the previous example:
SELECT
SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS UserName
FROM Employees;
Result:
UserName
------------
john_doe
jane.smith
mike.j
sarah.w
robert_brown
emily.davis
david.m
lisa.wilson
tom.anderson
Advanced Parsing with PARSENAME()
PARSENAME()
is traditionally used for parsing object names, but it can be repurposed for general string parsing:
SELECT PARSENAME('192.168.0.1', 4);
Result:
192
This technique works well for strings with up to four period-separated parts. PARSENAME()
works backwards, and so that’s why 4
returned the first octet.
If the separators are something other than dots, we can use REPLACE()
to change them to the relevant separator:
SELECT PARSENAME(REPLACE('HWP-435-9002-98', '-', '.'), 2);
Result:
9002
If you don’t like the fact that PARSENAME()
works backwards, there’s always the option of using the REVERSE()
function in a couple of places to change the order around:
SELECT REVERSE(PARSENAME(REPLACE(REVERSE('HWP-435-9002-98'), '-', '.'), 2));
Result:
435
Here, the inner REVERSE()
would’ve resulted in 435
being returned as 534
(reverse order), and so the outer REVERSE()
reverses that again to its original order.
Bearing in mind that the PARSENAME()
wasn’t really designed for this type of string splitting (it is mainly for returning a specified part of an object name), there are usually other ways to do this type of thing (and let’s not forget our first example that uses the STRING_SPLIT()
function). Here’s an interesting discussion over at SQLTeam.com about the pros and cons of using PARSENAME()
for string splitting along with some alternatives.
Fuzzy Matching with SOUNDEX()
The SOUNDEX()
function can be used for phonetic matching:
SELECT *
FROM Employees
WHERE SOUNDEX(LastName) = SOUNDEX('Smith');
Result:
EmployeeID FirstName LastName Email Department
---------- --------- -------- ------------------------- ----------
2 Jane Smith [email protected] Marketing
10 Sasha Smyth [email protected] Sales
11 Rod Smythe [email protected] Sales
We can see that although we only entered Smith
, the SOUNDEX()
function ensured that it also matched other names that sound similar to Smith
.
Conclusion
These handy T-SQL string manipulation techniques can significantly enhance our ability to work with text data in SQL Server.
Remember to consider performance implications when working with large datasets, and always test thoroughly in a non-production environment before implementing these techniques in production systems.