LEN() vs DATALENGTH() in SQL Server

When using T-SQL in SQL Server (or Azure) the LEN() and DATALENGTH() functions will often return the same result, but not always. There are some cases where these functions will return completely different results for what appears to be the same data. This is because there’s an important difference between how the LEN() and DATALENGTH() functions work, as we’ll see here.

First up, here’s a quick definition of each:

LEN()
Returns the number of characters of the specified string expression, excluding trailing blanks.
DATALENGTH()
Returns the number of bytes used to represent any expression.

Note “characters” vs “bytes”. Also note that “excluding trailing blanks” only applies to one.

Here are some examples to demonstrate the differences between LEN() and DATALENGTH().

Trailing Blanks

One difference between the LEN() and DATALENGTH() functions is that the LEN() function excludes trailing blanks (trailing spaces, tabs, etc) whereas DATALENGTH() includes trailing blanks. Note, we’re only talking about blanks that come at the end of the string – not at the start or in the middle.

Here’s an example without trailing blanks:

SELECT 
    LEN('Lit') AS Len,
    DATALENGTH('Lit') AS DataLength;

Result:

Len  DataLength
---  ----------
3    3   

And here’s an example with trailing blanks:

SELECT 
    LEN('Lit ') AS Len,
    DATALENGTH('Lit ') AS DataLength;

Result:

Len  DataLength
---  ----------
3    4  

However, leading spaces are counted by both functions:

SELECT 
    LEN(' Lit') AS Len,
    DATALENGTH(' Lit') AS DataLength;

Result:

Len  DataLength
---  ----------
4    4         

Bytes vs Characters

Another key difference between LEN() and DATALENGTH() is that the LEN() function returns the number of characters in a string. On the other hand, DATALENGTH()  returns the number of bytes in an expression.

This is an important distinction because the number of bytes in an expression doesn’t necessarily match the number of characters in the string. When processing a unicode string, DATALENGTH() will return twice the number of characters. This is because a unicode string stores 2 bytes per character.

In the above example, we saw that both LEN() and DATALENGTH() returned the same result for the word Lit (3). But, once we start querying a database, the result will depend on how the data is stored. For example, if it’s stored as varchar, the results will be the same. If it’s stored as nvarchar the DATALENGTH()  function will return twice the number of characters. And if it’s stored as say, char(25)DATALENGTH() will return exactly 25 characters.

Examples

Let’s run the following query:

SELECT 
    ArtistName, 
    LEN(ArtistName) AS Len,
    DATALENGTH(ArtistName) AS DataLength
FROM Artists 
WHERE ArtistName = 'Lit';

The results of this query will depend on how the data is stored.

nvarchar(255)

If the ArtistName column stores data as nvarchar(255):

ArtistName  Len  DataLength
----------  ---  ----------
Lit         3    6         

varchar(255)

If we change that column to varchar(255), we get the following result:

ArtistName  Len  DataLength
----------  ---  ----------
Lit         3    3         

char(25)

If we change that column to char(25), we get the following result:

ArtistName                 Len  DataLength
-------------------------  ---  ----------
Lit                        3    25        

So perhaps a moral to all this is, if you find yourself getting weird results when trying to retrieve string lengths etc, check that you’re using the correct function.