In MongoDB, the $unionWith
aggregation pipeline stage performs a union of two collections, and it includes duplicates.
This behaves in a similar way to SQL’s UNION ALL
, which also includes duplicates. By contrast, using just UNION
(i.e. without the ALL
)in SQL removes duplicates.
In MongoDB, we don’t have the option of specifying $unionWith ALL
or similar, so we need to reduce duplicates in another way.
In MongoDB, we can remove duplicates by using the $group
stage.
Example
Suppose we insert the following documents into two collections; one called cats
and another called dogs
:
db.cats.insertMany([
{ _id: 1, name: "Fluffy", type: "Cat", weight: 5 },
{ _id: 2, name: "Scratch", type: "Cat", weight: 3 },
{ _id: 3, name: "Meow", type: "Cat", weight: 7 }
])
db.dogs.insertMany([
{ _id: 1, name: "Wag", type: "Dog", weight: 20 },
{ _id: 2, name: "Bark", type: "Dog", weight: 10 },
{ _id: 3, name: "Fluffy", type: "Dog", weight: 40 }
])
And suppose we run the following query to return all names from both collections:
db.cats.aggregate( [
{ $project: { name: 1, _id: 0 } },
{ $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} }
] )
Result:
{ "name" : "Fluffy" } { "name" : "Scratch" } { "name" : "Meow" } { "name" : "Wag" } { "name" : "Bark" } { "name" : "Fluffy" }
We can see that the name Fluffy appears twice. This is because there are two Fluffys in our collections – one in the cats
collection and one in the dogs
collection.
This is fine if we’re happy to have duplicate values. But what if we don’t? What if we only want a list of distinct names from both collections?
That’s where the $group
stage comes in.
We can add the $group
stage to the name
field, so that it looks like this:
db.cats.aggregate( [
{ $project: { name: 1, _id: 0 } },
{ $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} },
{ $group: { _id: "$name" } }
] )
Result:
{ "_id" : "Meow" } { "_id" : "Bark" } { "_id" : "Scratch" } { "_id" : "Wag" } { "_id" : "Fluffy" }
This time we only get 5 documents instead of 6, and there’s only one Fluffy.