MongoDB $setIntersection

In MongoDB, the $setIntersection aggregation pipeline operator accepts two or more arrays and returns an array that contains the elements that appear in all input arrays.

$setIntersection accepts two or more arguments, all of which can be any valid expression as long as they each resolve to an array. $setIntersection treats the arrays as sets.

Example

Suppose we have a collection called data with the following documents:

{ "_id" : 1, "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 2, "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ] }
{ "_id" : 3, "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 4, "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ] }
{ "_id" : 5, "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ] }

We can apply the $setIntersection operator against the a and b fields in those documents.

Example:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 1, 2, 3, 4, 5 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "b" : [ 1, 2, 3 ], "result" : [ 1, 2, 3 ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 1, 2 ], "result" : [ 1, 2 ] }
{ "a" : [ 1, 2 ], "b" : [ 1, 2, 3 ], "result" : [ 1, 2 ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 3, 4, 5 ], "result" : [ 3 ] }
{ "a" : [ 1, 2, 3 ], "b" : [ 4, 5, 6 ], "result" : [ ] }

Nested Arrays

The $setIntersection operator does not descend into any nested arrays. It only evaluates top-level arrays.

Suppose our collection also contains the following documents:

{ "_id" : 6, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ] }
{ "_id" : 7, "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ] }

And we apply $setIntersection to those two documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 6, 7 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2, 3 ] ], "result" : [ ] }
{ "a" : [ 1, 2, 3 ], "b" : [ [ 1, 2 ], 3 ], "result" : [ 3 ] }

In the first document, the b field contained an array that contained just one element – another array. In this case, The outer array was evaluated, and it was found to not contain any of the values that were in the the array at a.

However, suppose we have the following documents:

{ "_id" : 8, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ] }
{ "_id" : 9, "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ] }

Document 8 contains a nested array at both the a and b fields, and both arrays are identical.

Here’s what happens when we apply $setIntersection to those documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 8, 9 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2, 3 ] ], "result" : [ [ 1, 2, 3 ] ] }
{ "a" : [ [ 1, 2, 3 ] ], "b" : [ [ 1, 2 ], 3 ], "result" : [ ] }

In the first document, a matches b exactly, and so the result is an identical array (an outer array containing the inner array).

In the second document, the contents of a does not match any of the contents of b, and so an empty array is returned.

Missing Fields

Applying $setIntersection to a non-existent field results in null.

Consider the following documents:

{ "_id" : 10, "a" : [ 1, 2, 3 ] }
{ "_id" : 11, "b" : [ 1, 2, 3 ] }
{ "_id" : 12 }

The first document doesn’t have a b field, the second document doesn’t have an a field, and the third document doesn’t have either.

Here’s what happens when we apply $setIntersection to the a and b fields:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 10, 11, 12 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2, 3 ], "result" : null }
{ "b" : [ 1, 2, 3 ], "result" : null }
{ "result" : null }

Wrong Data Type

All operands of $setIntersection must be arrays. If they aren’t, an error is thrown.

Suppose our collection contains the following documents:

{ "_id" : 13, "a" : [ 1, 2, 3 ], "b" : 3 }
{ "_id" : 14, "a" : 3, "b" : [ 1, 2, 3 ] }
{ "_id" : 15, "a" : 2, "b" : 3 }

And we apply $setIntersection to those documents:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 13, 14, 15 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

Error: command failed: {
	"ok" : 0,
	"errmsg" : "All operands of $setIntersection must be arrays. One argument is of type: double",
	"code" : 17047,
	"codeName" : "Location17047"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1

Duplicate Values

The $setIntersection operator filters out duplicates in its result to output an array that contain only unique entries. Also, the order of the elements in the output array is unspecified.

Suppose we have the following documents:

{ "_id" : 16, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ] }
{ "_id" : 17, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ] }
{ "_id" : 18, "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ] }
{ "_id" : 19, "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ] }
{ "_id" : 20, "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ] }
{ "_id" : 21, "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ] }

Then we apply the $setIntersection operator to them:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 16, 17, 18, 19, 20, 21 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            result: { $setIntersection: [ "$a", "$b" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2, 3 ], "result" : [ 1, 2, 3 ] }
{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ 1, 2 ], "result" : [ 1, 2 ] }
{ "a" : [ 1, 1, 2, 2, 3, 3 ], "b" : [ ], "result" : [ ] }
{ "a" : [ 3, 2, 1, 2, 3, 1 ], "b" : [ 2, 3, 1 ], "result" : [ 1, 2, 3 ] }
{ "a" : [ 1, 3, 2, 2, 3, 1 ], "b" : [ 2, 1 ], "result" : [ 1, 2 ] }
{ "a" : [ 2, 3, 1, 2, 3, 1 ], "b" : [ ], "result" : [ ] }

More than Two Arguments

As mentioned, $setIntersection accepts two or more arguments. All previous examples used two arguments. Here’s one that uses three arguments.

Suppose we have the following documents:

{ "_id" : 22, "a" : [ 1, 2 ], "b" : [ 1, 2 ], "c" : [ 1, 2 ] }
{ "_id" : 23, "a" : [ 1, 2 ], "b" : [ 1, 2 ], "c" : [ 1, 2, 3 ] }

These documents have an extra field – a c field.

Now let’s apply $setIntersection to those three fields:

db.data.aggregate(
   [
     { $match: { _id: { $in: [ 22, 23 ] } } },
     {
       $project:
          {
            _id: 0,
            a: 1,
            b: 1,
            c: 1,
            result: { $setIntersection: [ "$a", "$b", "$c" ] }
          }
     }
   ]
)

Result:

{ "a" : [ 1, 2 ], "b" : [ 1, 2 ], "c" : [ 1, 2 ], "result" : [ 1, 2 ] }
{ "a" : [ 1, 2 ], "b" : [ 1, 2 ], "c" : [ 1, 2, 3 ], "result" : [ 1, 2 ] }