MongoDB, the $strLenBytes
aggregation pipeline operator returns the number of UTF-8 encoded bytes in the specified string.
Each character in a string can contain contain a different number of bytes, depending on the character being used. The $strLenBytes
operator can figure out how many bytes each character contains and return the correct result for the whole string.
Example
Suppose we have a collection called english
with the following documents:
{ "_id" : 1, "data" : "Maimuang" } { "_id" : 2, "data" : "M" } { "_id" : 3, "data" : "a" } { "_id" : 4, "data" : "i" } { "_id" : 5, "data" : "m" } { "_id" : 6, "data" : "u" } { "_id" : 7, "data" : "a" } { "_id" : 8, "data" : "n" } { "_id" : 9, "data" : "g" }
We can apply $strLenBytes
to the data field in those documents:
db.english.aggregate(
[
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
{ "data" : "Maimuang", "result" : 8 } { "data" : "M", "result" : 1 } { "data" : "a", "result" : 1 } { "data" : "i", "result" : 1 } { "data" : "m", "result" : 1 } { "data" : "u", "result" : 1 } { "data" : "a", "result" : 1 } { "data" : "n", "result" : 1 } { "data" : "g", "result" : 1 }
We can see that the whole word is 8 bytes and each character is 1 byte each.
Thai Characters
Here’s an example that uses Thai characters, which are 3 bytes each.
We have a collection called thai
with the following documents:
{ "_id" : 1, "data" : "ไม้เมือง" } { "_id" : 2, "data" : "ไ" } { "_id" : 3, "data" : "ม้" } { "_id" : 4, "data" : "เ" } { "_id" : 5, "data" : "มื" } { "_id" : 6, "data" : "อ" } { "_id" : 7, "data" : "ง" }
And here’s what happens when we apply $strLenBytes
to those documents:
db.thai.aggregate( [ { $project: { _id: 0, data: 1, result: { $strLenBytes: "$data" } } } ] )
Result:
{ "data" : "ไม้เมือง", "result" : 24 } { "data" : "ไ", "result" : 3 } { "data" : "ม้", "result" : 6 } { "data" : "เ", "result" : 3 } { "data" : "มื", "result" : 6 } { "data" : "อ", "result" : 3 } { "data" : "ง", "result" : 3 }
Two of these characters have been modified using diacritics, which result in 6 bytes being returned.
Other Characters
Suppose we have a collection called other
with the following documents:
{ "_id" : 1, "data" : "é" } { "_id" : 2, "data" : "©" } { "_id" : 3, "data" : "℘" }
And let’s apply $strLenBytes
to those documents:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 1, 2, 3 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
{ "data" : "é", "result" : 2 } { "data" : "©", "result" : 2 } { "data" : "℘", "result" : 3 }
The first two characters are 2 bytes and the third is 3 bytes. The number of bytes depends on the character. Some characters can use 4 bytes.
The space character uses a byte. Two space characters therefore use 2 bytes, and so on.
Suppose we have the following documents:
{ "_id" : 4, "data" : " " } { "_id" : 5, "data" : " " }
And we apply $strLenBytes
to those documents:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 4, 5 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
{ "data" : " ", "result" : 1 } { "data" : " ", "result" : 2 }
Empty Strings
Empty strings return 0
.
Here’s a document with an empty string:
{ "_id" : 6, "data" : "" }
And here’s what happens when we apply $strLenBytes
to that document:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 6 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
{ "data" : "", "result" : 0 }
Wrong Data Type
Passing the wrong data type results in an error.
Suppose we have the following document:
{ "_id" : 7, "data" : 123 }
The data field
contains a number.
Let’s apply $strLenBytes
to that document:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 7 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
Error: command failed: { "ok" : 0, "errmsg" : "$strLenBytes requires a string argument, found: double", "code" : 34473, "codeName" : "Location34473" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:639:17 assert.commandWorked@src/mongo/shell/assert.js:729:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12 @(shell):1:1
Null Values
Providing null
also results in an error.
Suppose we have the following document:
{ "_id" : 8, "data" : null }
The data field
contains null
.
Let’s apply $strLenBytes
to that document:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 8 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "$strLenBytes requires a string argument, found: null", "code" : 34473, "codeName" : "Location34473" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:639:17 assert.commandWorked@src/mongo/shell/assert.js:729:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12 @(shell):1:1
Missing Fields
Continuing with the theme of producing errors, specifying a non-existent field also produces an error.
Document:
{ "_id" : 9 }
Apply $strLenBytes
:
db.other.aggregate(
[
{ $match: { _id: { $in: [ 9 ] } } },
{
$project:
{
_id: 0,
data: 1,
result: { $strLenBytes: "$data" }
}
}
]
)
Result:
Error: command failed: { "ok" : 0, "errmsg" : "$strLenBytes requires a string argument, found: missing", "code" : 34473, "codeName" : "Location34473" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:639:17 assert.commandWorked@src/mongo/shell/assert.js:729:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12 @(shell):1:1