MongoDB $strLenBytes

MongoDB, the $strLenBytes aggregation pipeline operator returns the number of UTF-8 encoded bytes in the specified string.

Each character in a string can contain contain a different number of bytes, depending on the character being used. The $strLenBytes operator can figure out how many bytes each character contains and return the correct result for the whole string.

Example

Suppose we have a collection called english with the following documents:

{ "_id" : 1, "data" : "Maimuang" }
{ "_id" : 2, "data" : "M" }
{ "_id" : 3, "data" : "a" }
{ "_id" : 4, "data" : "i" }
{ "_id" : 5, "data" : "m" }
{ "_id" : 6, "data" : "u" }
{ "_id" : 7, "data" : "a" }
{ "_id" : 8, "data" : "n" }
{ "_id" : 9, "data" : "g" }

We can apply $strLenBytes to the data field in those documents:

db.english.aggregate(
   [
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

{ "data" : "Maimuang", "result" : 8 }
{ "data" : "M", "result" : 1 }
{ "data" : "a", "result" : 1 }
{ "data" : "i", "result" : 1 }
{ "data" : "m", "result" : 1 }
{ "data" : "u", "result" : 1 }
{ "data" : "a", "result" : 1 }
{ "data" : "n", "result" : 1 }
{ "data" : "g", "result" : 1 }

We can see that the whole word is 8 bytes and each character is 1 byte each.

Thai Characters

Here’s an example that uses Thai characters, which are 3 bytes each.

We have a collection called thai with the following documents:

{ "_id" : 1, "data" : "ไม้เมือง" }
{ "_id" : 2, "data" : "ไ" }
{ "_id" : 3, "data" : "ม้" }
{ "_id" : 4, "data" : "เ" }
{ "_id" : 5, "data" : "มื" }
{ "_id" : 6, "data" : "อ" }
{ "_id" : 7, "data" : "ง" }

And here’s what happens when we apply $strLenBytes to those documents:

db.thai.aggregate(
   [
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

{ "data" : "ไม้เมือง", "result" : 24 }
{ "data" : "ไ", "result" : 3 }
{ "data" : "ม้", "result" : 6 }
{ "data" : "เ", "result" : 3 }
{ "data" : "มื", "result" : 6 }
{ "data" : "อ", "result" : 3 }
{ "data" : "ง", "result" : 3 }

Two of these characters have been modified using diacritics, which result in 6 bytes being returned.

Other Characters

Suppose we have a collection called other with the following documents:

{ "_id" : 1, "data" : "é" }
{ "_id" : 2, "data" : "©" }
{ "_id" : 3, "data" : "℘" }

And let’s apply $strLenBytes to those documents:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 1, 2, 3 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

{ "data" : "é", "result" : 2 }
{ "data" : "©", "result" : 2 }
{ "data" : "℘", "result" : 3 }

The first two characters are 2 bytes and the third is 3 bytes. The number of bytes depends on the character. Some characters can use 4 bytes.

The space character uses a byte. Two space characters therefore use 2 bytes, and so on.

Suppose we have the following documents:

{ "_id" : 4, "data" : " " }
{ "_id" : 5, "data" : "  " }

And we apply $strLenBytes to those documents:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 4, 5 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

{ "data" : " ", "result" : 1 }
{ "data" : "  ", "result" : 2 }

Empty Strings

Empty strings return 0.

Here’s a document with an empty string:

{ "_id" : 6, "data" : "" }

And here’s what happens when we apply $strLenBytes to that document:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 6 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

{ "data" : "", "result" : 0 }

Wrong Data Type

Passing the wrong data type results in an error.

Suppose we have the following document:

{ "_id" : 7, "data" : 123 }

The data field contains a number.

Let’s apply $strLenBytes to that document:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 7 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

Error: command failed: {
	"ok" : 0,
	"errmsg" : "$strLenBytes requires a string argument, found: double",
	"code" : 34473,
	"codeName" : "Location34473"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1

Null Values

Providing null also results in an error.

Suppose we have the following document:

{ "_id" : 8, "data" : null }

The data field contains null.

Let’s apply $strLenBytes to that document:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 8 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

uncaught exception: Error: command failed: {
	"ok" : 0,
	"errmsg" : "$strLenBytes requires a string argument, found: null",
	"code" : 34473,
	"codeName" : "Location34473"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1

Missing Fields

Continuing with the theme of producing errors, specifying a non-existent field also produces an error.

Document:

{ "_id" : 9 }

Apply $strLenBytes:

db.other.aggregate(
   [
     { $match: { _id: { $in: [ 9 ] } } },
     {
       $project:
          {
            _id: 0,
            data: 1,
            result: { $strLenBytes: "$data" }
          }
     }
   ]
)

Result:

Error: command failed: {
	"ok" : 0,
	"errmsg" : "$strLenBytes requires a string argument, found: missing",
	"code" : 34473,
	"codeName" : "Location34473"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1