MongoDB $setDifference

In MongoDB, the $setDifference aggregation pipeline operator accepts two sets and performs a relative complement of the second set relative to the first. It returns an array containing the elements that only exist in the first set.

$setDifference accepts two arguments, both of which can be any valid expression as long as they each resolve to an array. $setDifference treats the arrays as sets.

Example

Suppose we have a collection called data with the following documents:

{ "_id" : 1, "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 2, "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ] }
{ "_id" : 3, "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 4, "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ] }
{ "_id" : 5, "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ] }

We can apply the $setDifference operator against the a and b fields in those documents.

Example:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 1, 2, 3, 4, 5 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ], "result" : [ ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ], "result" : [ 3 ] }
{ "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ], "result" : [ ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ], "result" : [ 1, 2 ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ], "result" : [ 1, 2, 3 ] }

Nested Arrays

The $setDifference operator does not descend into any nested arrays. It only evaluates top-level arrays.

Suppose our collection also contains the following documents:

{ "_id" : 6, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ] }
{ "_id" : 7, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ] }

And we apply $setDifference to those two documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 6, 7 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ], "result" : [ 1, 2, 3 ] }
{ "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ], "result" : [ 1, 2 ] }

In the first document, the b field contained an array that contained just one element – another array. In this case, The outer array was evaluated, and it was found to not contain the same values that were in the the array at a.

However, if the a field had contained a nested array itself, it might have been a different story.

Suppose we have the following documents:

{ "_id" : 8, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ] }
{ "_id" : 9, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ] }

And we apply $setDifference to those documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 8, 9 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ], "result" : [ ] }
{ "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ], "result" : [ [ 1, 2, 3 ] ] }

In the first document, a matches b exactly, and so the result is an empty array.

In the second document, the nested array at a is different to the nested array at b, and so the whole nested array from a is returned.

Missing Fields

Applying $setDifference to a non-existent field results in null.

Consider the following documents:

{ "_id" : 10, "a" : [ 1, 2, 3 ] }
{ "_id" : 11, "b" : [ 1, 2, 3 ] }
{ "_id" : 12 }

The first document doesn’t have a b field, the second document doesn’t have an a field, and the third document doesn’t have either.

Here’s what happens when we apply $setDifference to the a and b fields:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 10, 11, 12 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "result" : null }
{ "b" : [ 1, 2, 3 ], "result" : null }
{ "result" : null }

Wrong Data Type

Both operands of $setDifference must be arrays. If they aren’t, an error is thrown.

Suppose our collection contains the following documents:

{ "_id" : 13, "a" : [ 1, 2, 3 ], "b" : 3 }
{ "_id" : 14, "a" : 3, "b" : [ 1, 2, 3 ] }
{ "_id" : 15, "a" : 2, "b" : 3 }

And we apply $setDifference to those documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 13, 14, 15 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

uncaught exception: Error: command failed: {
	"ok" : 0,
	"errmsg" : "both operands of $setDifference must be arrays. Second argument is of type: double",
	"code" : 17049,
	"codeName" : "Location17049"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1

Duplicate Values

The $setDifference operator filters out duplicates in its result to output an array that contain only unique entries. Also, the order of the elements in the output array is unspecified.

Suppose we have the following documents:

{ "_id" : 16, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 17, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ] }
{ "_id" : 18, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ] }
{ "_id" : 19, "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ] }
{ "_id" : 20, "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ] }
{ "_id" : 21, "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ] }

Then we apply the $setDifference operator to them:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 16, 17, 18, 19, 20, 21 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setDifference: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ], "result" : [ ] }
{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ], "result" : [ 3 ] }
{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ], "result" : [ 1, 2, 3 ] }
{ "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ], "result" : [ ] }
{ "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ], "result" : [ 3 ] }
{ "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ], "result" : [ 2, 3, 1 ] }