Detecting and Filtering Special Characters Using PATINDEX() and LIKE in SQL Server

Working with real-world data often means dealing with messy strings. It’s common to find values that contain unexpected special characters. Sometimes this is due to user input, sometimes it’s from imports or third-party sources.

Either way, when we need to find and filter these special characters, SQL Server gives us some handy tools to work with. For starters, there’s the LIKE operator, which anyone who’s used SQL would be familiar with. But there’s also the PATINDEX() function, which performs a slightly different task.

While both of these can help locate unwanted characters, they work slightly differently. PATINDEX() returns the first position of a pattern within a string, while LIKE is more about filtering rows based on whether a pattern exists at all. They both make it possible to find rows containing special characters we want to flag, clean, or exclude.

Sample Data

Let’s set up a small sample dataset. Imagine a table of internal project codes. Normally these codes should be alphanumeric only, but due to sloppy imports, some codes contain odd symbols:

CREATE TABLE ProjectCodes (
    ProjectID INT PRIMARY KEY,
    Code NVARCHAR(50)
);

INSERT INTO ProjectCodes (ProjectID, Code)
VALUES
(1, 'ALPHA123'),
(2, 'BETA_456'),
(3, 'GAMMA-789'),
(4, 'OMEGA#001'),
(5, 'DELTA!X'),
(6, 'SIGMA2025'),
(7, 'THETA@DEV'),
(8, 'ZETA PROD');

Here’s the main takeaway from this dataset:

  • ALPHA123 and SIGMA2025 are clean (only alphanumeric).
  • The rest have underscores, dashes, hashtags, spaces, exclamation marks, or @.

Now let’s use a query to detect them.

Using PATINDEX() to Find Special Characters

PATINDEX() returns the position of the first match for a pattern. If there’s no match, it returns zero.

We can use the following pattern to detect non-alphanumric characters:

'%[^a-zA-Z0-9]%'

Here’s what each part does:

  • ^ inside the square brackets means “not”.
  • a-z and A-Z cover letters.
  • 0-9 covers digits.

So this means “any character that isn’t a letter or digit”. Or in other words “anything that’s not alphanumeric”.

Example query:

SELECT ProjectID, Code,
       PATINDEX('%[^a-zA-Z0-9]%', Code) AS FirstSpecialCharPosition
FROM ProjectCodes;

Result:

ProjectID  Code       FirstSpecialCharPosition
--------- --------- ------------------------
1 ALPHA123 0
2 BETA_456 5
3 GAMMA-789 6
4 OMEGA#001 6
5 DELTA!X! 6
6 SIGMA2025 0
7 THETA@DEV 6
8 ZETA PROD 5

This is useful if we want to see exactly where the problem starts in each string. Notice that it only returns the position of the first instance of the special character. We can see this with the fifth row. The code in that row contains two ! characters, but PATINDEX() only returns the position of the first one.

Using LIKE for Quick Filtering

The LIKE operator is simpler for quickly filtering the query results based on the existence of certain characters:

SELECT ProjectID, Code
FROM ProjectCodes
WHERE Code LIKE '%[^a-zA-Z0-9]%';

Result:

ProjectID  Code     
--------- ---------
2 BETA_456
3 GAMMA-789
4 OMEGA#001
5 DELTA!X!
7 THETA@DEV
8 ZETA PROD

This returns only rows that contain at least one special character. The query is basically saying “give me all the bad ones”. We can flip the results by using the NOT operator:

SELECT ProjectID, Code
FROM ProjectCodes
WHERE Code NOT LIKE '%[^a-zA-Z0-9]%';

Result:

ProjectID  Code     
--------- ---------
1 ALPHA123
6 SIGMA2025

Actually, it’s also possible to use PATINDEX() to filter the results in a similar manner:

SELECT ProjectID, Code
FROM ProjectCodes
WHERE PATINDEX('%[^a-zA-Z0-9]%', Code) <> 0;

Result:

ProjectID  Code     
--------- ---------
2 BETA_456
3 GAMMA-789
4 OMEGA#001
5 DELTA!X!
7 THETA@DEV
8 ZETA PROD

We filtered based on the output of PATINDEX() not being zero. We could switch the not equal to operator (<>) to the equal to operator (=) to return the clean results:

SELECT ProjectID, Code
FROM ProjectCodes
WHERE PATINDEX('%[^a-zA-Z0-9]%', Code) = 0;

Result:

ProjectID  Code     
--------- ---------
1 ALPHA123
6 SIGMA2025