Remove Duplicates when using $unionWith in MongoDB

In MongoDB, the $unionWith aggregation pipeline stage performs a union of two collections, and it includes duplicates.

This behaves in a similar way to SQL’s UNION ALL, which also includes duplicates. By contrast, using just UNION (i.e. without the ALL)in SQL removes duplicates.

In MongoDB, we don’t have the option of specifying $unionWith ALL or similar, so we need to reduce duplicates in another way.

In MongoDB, we can remove duplicates by using the $group stage.

Example

Suppose we insert the following documents into two collections; one called cats and another called dogs:

db.cats.insertMany([
    { _id: 1, name: "Fluffy", type: "Cat", weight: 5 },
    { _id: 2, name: "Scratch", type: "Cat", weight: 3 },
    { _id: 3, name: "Meow", type: "Cat", weight: 7 }
    ])

db.dogs.insertMany([
    { _id: 1, name: "Wag", type: "Dog", weight: 20 },
    { _id: 2, name: "Bark", type: "Dog", weight: 10 },
    { _id: 3, name: "Fluffy", type: "Dog", weight: 40 }
    ])

And suppose we run the following query to return all names from both collections:

db.cats.aggregate( [
   { $project: { name: 1, _id: 0 } },
   { $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} }
] )

Result:

{ "name" : "Fluffy" }
{ "name" : "Scratch" }
{ "name" : "Meow" }
{ "name" : "Wag" }
{ "name" : "Bark" }
{ "name" : "Fluffy" }

We can see that the name Fluffy appears twice. This is because there are two Fluffys in our collections – one in the cats collection and one in the dogs collection.

This is fine if we’re happy to have duplicate values. But what if we don’t? What if we only want a list of distinct names from both collections?

That’s where the $group stage comes in.

We can add the $group stage to the name field, so that it looks like this:

db.cats.aggregate( [
   { $project: { name: 1, _id: 0 } },
   { $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} },
   { $group: { _id: "$name" } }
] )

Result:

{ "_id" : "Meow" }
{ "_id" : "Bark" }
{ "_id" : "Scratch" }
{ "_id" : "Wag" }
{ "_id" : "Fluffy" }

This time we only get 5 documents instead of 6, and there’s only one Fluffy.