$unionWith – MongoDB’s Equivalent of UNION ALL

If you’re familiar with SQL, you might know about the UNION clause, which concatenates the results of two queries into a single result set. In particular, UNION ALL includes duplicates.

In MongoDB, we can use the $unionWith aggregation pipeline stage to achieve the same effect that UNION ALL produces. The $unionWith stage performs a union of two collections – it combines pipeline results from two collections into a single result set. And it includes duplicates.

Example

Suppose we create two collections; one called cats and another called dogs. And we insert the following documents into them:

db.cats.insertMany([
    { _id: 1, name: "Fluffy", type: "Cat", weight: 5 },
    { _id: 2, name: "Scratch", type: "Cat", weight: 3 },
    { _id: 3, name: "Meow", type: "Cat", weight: 7 }
    ])

db.dogs.insertMany([
    { _id: 1, name: "Wag", type: "Dog", weight: 20 },
    { _id: 2, name: "Bark", type: "Dog", weight: 10 },
    { _id: 3, name: "Fluffy", type: "Dog", weight: 40 }
    ])

We can now run a queries against those collections and use the $unionWith stage to combine the results of each query.

Example:

db.cats.aggregate( [
   { $set: { _id: "$_id" } },
   { $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "$_id" } } ] } },
   { $sort: { type: 1, weight: -1, name: 1 } }
] )

Result:

{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 1, "name" : "Fluffy", "type" : "Cat", "weight" : 5 }
{ "_id" : 2, "name" : "Scratch", "type" : "Cat", "weight" : 3 }
{ "_id" : 3, "name" : "Fluffy", "type" : "Dog", "weight" : 40 }
{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }

In this example, each document has a type field with either cat or dog and so it’s quite apparent which document comes from which collection.

But if the documents didn’t have the type field, then it would be harder to work out where one collection finishes and another starts. In this case, we can use a string literal at the $set stage to represent the collection name.

Example:

db.cats.aggregate( [
   { $set: { _id: "cat" } },
   { $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "dog" } } ] } },
   { $sort: { type: 1, weight: -1, name: 1 } }
] )

Result:

{ "_id" : "cat", "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : "cat", "name" : "Fluffy", "type" : "Cat", "weight" : 5 }
{ "_id" : "cat", "name" : "Scratch", "type" : "Cat", "weight" : 3 }
{ "_id" : "dog", "name" : "Fluffy", "type" : "Dog", "weight" : 40 }
{ "_id" : "dog", "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : "dog", "name" : "Bark", "type" : "Dog", "weight" : 10 }

Sorting Across Collections

In the previous examples, the cats and the dogs were sorted in a way that separated them into two distinct groups; cats first, then dogs. This happened mainly because we sorted on the type field first.

But we can sort it on any other field, which could result in the cats and dogs being combined.

Example:

db.cats.aggregate( [
   { $set: { _id: "cat" } },
   { $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "dog" } } ] } },
   { $sort: { name: 1 } }
] )

Result:

{ "_id" : "dog", "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : "cat", "name" : "Fluffy", "type" : "Cat", "weight" : 5 }
{ "_id" : "dog", "name" : "Fluffy", "type" : "Dog", "weight" : 40 }
{ "_id" : "cat", "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : "cat", "name" : "Scratch", "type" : "Cat", "weight" : 3 }
{ "_id" : "dog", "name" : "Wag", "type" : "Dog", "weight" : 20 }

Projections

You can use the $project stage to specify which fields are to be passed on to the next stage in the pipeline. For example, you can therefore reduce the number of fields returned by the query.

Example:

db.cats.aggregate( [
   { $project: { name: 1, _id: 0 } },
   { $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} }
] )

Result:

{ "name" : "Fluffy" }
{ "name" : "Scratch" }
{ "name" : "Meow" }
{ "name" : "Wag" }
{ "name" : "Bark" }
{ "name" : "Fluffy" }

Remove Duplicates

You can use the $group stage to eliminate redundant duplicates from the result.

For example, the previous query returned two pets called Fluffy. We can add a $group stage to that query to eliminate the redundant duplicate, so that just one Fluffy is returned.

db.cats.aggregate( [
   { $project: { name: 1, _id: 0 } },
   { $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} },
   { $group: { _id: "$name" } }
] )

Result:

{ "_id" : "Meow" }
{ "_id" : "Bark" }
{ "_id" : "Scratch" }
{ "_id" : "Wag" }
{ "_id" : "Fluffy" }

This time, only one Fluffy is returned.

Non-Matching Columns

One of the advantages that MongoDB’s $unionWith has over SQL’s UNION ALL is that it can be used with non-matching columns.

The SQL UNION clause requires that:

Both queries return the same number of columns
The columns in the same order
The matching columns must be of a compatible data type

The MongoDB $unionWith stage doesn’t impose these limitations.

Therefore, we could use $unionWith to do something like this:

db.cats.aggregate( [
   { $set: { _id: "$_id" } },
   { $unionWith: { coll: "employees", pipeline: [ { $set: { _id: "$_id" } } ] } },
   { $sort: { type: 1, salary: -1 } }
] )

Result:

{ "_id" : 2, "name" : "Sarah", "salary" : 128000 }
{ "_id" : 5, "name" : "Beck", "salary" : 82000 }
{ "_id" : 4, "name" : "Chris", "salary" : 45000 }
{ "_id" : 3, "name" : "Fritz", "salary" : 25000 }
{ "_id" : 1, "name" : "Fluffy", "type" : "Cat", "weight" : 5 }
{ "_id" : 2, "name" : "Scratch", "type" : "Cat", "weight" : 3 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }

In this case, we joined the cats collection with the employees collection. The employees collection didn’t have the same fields as the cats collection, but that’s fine – it still worked.