In MongoDB, the $setDifference
aggregation pipeline operator accepts two sets and performs a relative complement of the second set relative to the first. It returns an array containing the elements that only exist in the first set.
$setDifference
accepts two arguments, both of which can be any valid expression as long as they each resolve to an array. $setDifference
treats the arrays as sets.
Example
Suppose we have a collection called data
with the following documents:
{ "_id" : 1, "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ] } { "_id" : 2, "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ] } { "_id" : 3, "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ] } { "_id" : 4, "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ] } { "_id" : 5, "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ] }
We can apply the $setDifference
operator against the a
and b
fields in those documents.
Example:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 1, 2, 3, 4, 5 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
{ "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ], "result" : [ ] } { "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ], "result" : [ 3 ] } { "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ], "result" : [ ] } { "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ], "result" : [ 1, 2 ] } { "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ], "result" : [ 1, 2, 3 ] }
Nested Arrays
The $setDifference
operator does not descend into any nested arrays. It only evaluates top-level arrays.
Suppose our collection also contains the following documents:
{ "_id" : 6, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ] } { "_id" : 7, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ] }
And we apply $setDifference
to those two documents:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 6, 7 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
{ "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ], "result" : [ 1, 2, 3 ] } { "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ], "result" : [ 1, 2 ] }
In the first document, the b
field contained an array that contained just one element – another array. In this case, The outer array was evaluated, and it was found to not contain the same values that were in the the array at a
.
However, if the a
field had contained a nested array itself, it might have been a different story.
Suppose we have the following documents:
{ "_id" : 8, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ] } { "_id" : 9, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ] }
And we apply $setDifference
to those documents:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 8, 9 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
{ "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ], "result" : [ ] } { "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ], "result" : [ [ 1, 2, 3 ] ] }
In the first document, a
matches b
exactly, and so the result is an empty array.
In the second document, the nested array at a
is different to the nested array at b
, and so the whole nested array from a
is returned.
Missing Fields
Applying $setDifference
to a non-existent field results in null
.
Consider the following documents:
{ "_id" : 10, "a" : [ 1, 2, 3 ] } { "_id" : 11, "b" : [ 1, 2, 3 ] } { "_id" : 12 }
The first document doesn’t have a b
field, the second document doesn’t have an a
field, and the third document doesn’t have either.
Here’s what happens when we apply $setDifference
to the a
and b
fields:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 10, 11, 12 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
{ "a" : [ 1, 2, 3 ], "result" : null } { "b" : [ 1, 2, 3 ], "result" : null } { "result" : null }
Wrong Data Type
Both operands of $setDifference
must be arrays. If they aren’t, an error is thrown.
Suppose our collection contains the following documents:
{ "_id" : 13, "a" : [ 1, 2, 3 ], "b" : 3 } { "_id" : 14, "a" : 3, "b" : [ 1, 2, 3 ] } { "_id" : 15, "a" : 2, "b" : 3 }
And we apply $setDifference
to those documents:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 13, 14, 15 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "both operands of $setDifference must be arrays. Second argument is of type: double", "code" : 17049, "codeName" : "Location17049" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:639:17 assert.commandWorked@src/mongo/shell/assert.js:729:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12 @(shell):1:1
Duplicate Values
The $setDifference
operator filters out duplicates in its result to output an array that contain only unique entries. Also, the order of the elements in the output array is unspecified.
Suppose we have the following documents:
{ "_id" : 16, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ] } { "_id" : 17, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ] } { "_id" : 18, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ] } { "_id" : 19, "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ] } { "_id" : 20, "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ] } { "_id" : 21, "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ] }
Then we apply the $setDifference
operator to them:
db.data.aggregate(
[
{ $match: { _id: { $in: [ 16, 17, 18, 19, 20, 21 ] } } },
{
$project:
{
_id: 0,
a: 1,
b: 1,
result: { $setDifference: [ "$a", "$b" ] }
}
}
]
)
Result:
{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ], "result" : [ ] } { "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ], "result" : [ 3 ] } { "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ], "result" : [ 1, 2, 3 ] } { "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ], "result" : [ ] } { "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ], "result" : [ 3 ] } { "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ], "result" : [ 2, 3, 1 ] }