Working with real-world data often means dealing with messy strings. It’s common to find values that contain unexpected special characters. Sometimes this is due to user input, sometimes it’s from imports or third-party sources.
Either way, when we need to find and filter these special characters, SQL Server gives us some handy tools to work with. For starters, there’s the LIKE operator, which anyone who’s used SQL would be familiar with. But there’s also the PATINDEX() function, which performs a slightly different task.
While both of these can help locate unwanted characters, they work slightly differently. PATINDEX() returns the first position of a pattern within a string, while LIKE is more about filtering rows based on whether a pattern exists at all. They both make it possible to find rows containing special characters we want to flag, clean, or exclude.
Sample Data
Let’s set up a small sample dataset. Imagine a table of internal project codes. Normally these codes should be alphanumeric only, but due to sloppy imports, some codes contain odd symbols:
CREATE TABLE ProjectCodes (
ProjectID INT PRIMARY KEY,
Code NVARCHAR(50)
);
INSERT INTO ProjectCodes (ProjectID, Code)
VALUES
(1, 'ALPHA123'),
(2, 'BETA_456'),
(3, 'GAMMA-789'),
(4, 'OMEGA#001'),
(5, 'DELTA!X'),
(6, 'SIGMA2025'),
(7, 'THETA@DEV'),
(8, 'ZETA PROD');
Here’s the main takeaway from this dataset:
ALPHA123andSIGMA2025are clean (only alphanumeric).- The rest have underscores, dashes, hashtags, spaces, exclamation marks, or
@.
Now let’s use a query to detect them.
Using PATINDEX() to Find Special Characters
PATINDEX() returns the position of the first match for a pattern. If there’s no match, it returns zero.
We can use the following pattern to detect non-alphanumric characters:
'%[^a-zA-Z0-9]%'
Here’s what each part does:
^inside the square brackets means “not”.a-zandA-Zcover letters.0-9covers digits.
So this means “any character that isn’t a letter or digit”. Or in other words “anything that’s not alphanumeric”.
Example query:
SELECT ProjectID, Code,
PATINDEX('%[^a-zA-Z0-9]%', Code) AS FirstSpecialCharPosition
FROM ProjectCodes;
Result:
ProjectID Code FirstSpecialCharPosition
--------- --------- ------------------------
1 ALPHA123 0
2 BETA_456 5
3 GAMMA-789 6
4 OMEGA#001 6
5 DELTA!X! 6
6 SIGMA2025 0
7 THETA@DEV 6
8 ZETA PROD 5
This is useful if we want to see exactly where the problem starts in each string. Notice that it only returns the position of the first instance of the special character. We can see this with the fifth row. The code in that row contains two ! characters, but PATINDEX() only returns the position of the first one.
Using LIKE for Quick Filtering
The LIKE operator is simpler for quickly filtering the query results based on the existence of certain characters:
SELECT ProjectID, Code
FROM ProjectCodes
WHERE Code LIKE '%[^a-zA-Z0-9]%';
Result:
ProjectID Code
--------- ---------
2 BETA_456
3 GAMMA-789
4 OMEGA#001
5 DELTA!X!
7 THETA@DEV
8 ZETA PROD
This returns only rows that contain at least one special character. The query is basically saying “give me all the bad ones”. We can flip the results by using the NOT operator:
SELECT ProjectID, Code
FROM ProjectCodes
WHERE Code NOT LIKE '%[^a-zA-Z0-9]%';
Result:
ProjectID Code
--------- ---------
1 ALPHA123
6 SIGMA2025
Actually, it’s also possible to use PATINDEX() to filter the results in a similar manner:
SELECT ProjectID, Code
FROM ProjectCodes
WHERE PATINDEX('%[^a-zA-Z0-9]%', Code) <> 0;
Result:
ProjectID Code
--------- ---------
2 BETA_456
3 GAMMA-789
4 OMEGA#001
5 DELTA!X!
7 THETA@DEV
8 ZETA PROD
We filtered based on the output of PATINDEX() not being zero. We could switch the not equal to operator (<>) to the equal to operator (=) to return the clean results:
SELECT ProjectID, Code
FROM ProjectCodes
WHERE PATINDEX('%[^a-zA-Z0-9]%', Code) = 0;
Result:
ProjectID Code
--------- ---------
1 ALPHA123
6 SIGMA2025