If you’re familiar with SQL, you might know about the UNION
clause, which concatenates the results of two queries into a single result set. In particular, UNION ALL
includes duplicates.
In MongoDB, we can use the $unionWith
aggregation pipeline stage to achieve the same effect that UNION ALL
produces. The $unionWith
stage performs a union of two collections – it combines pipeline results from two collections into a single result set. And it includes duplicates.
Example
Suppose we create two collections; one called cats
and another called dogs
. And we insert the following documents into them:
db.cats.insertMany([
{ _id: 1, name: "Fluffy", type: "Cat", weight: 5 },
{ _id: 2, name: "Scratch", type: "Cat", weight: 3 },
{ _id: 3, name: "Meow", type: "Cat", weight: 7 }
])
db.dogs.insertMany([
{ _id: 1, name: "Wag", type: "Dog", weight: 20 },
{ _id: 2, name: "Bark", type: "Dog", weight: 10 },
{ _id: 3, name: "Fluffy", type: "Dog", weight: 40 }
])
We can now run a queries against those collections and use the $unionWith
stage to combine the results of each query.
Example:
db.cats.aggregate( [
{ $set: { _id: "$_id" } },
{ $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "$_id" } } ] } },
{ $sort: { type: 1, weight: -1, name: 1 } }
] )
Result:
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 } { "_id" : 1, "name" : "Fluffy", "type" : "Cat", "weight" : 5 } { "_id" : 2, "name" : "Scratch", "type" : "Cat", "weight" : 3 } { "_id" : 3, "name" : "Fluffy", "type" : "Dog", "weight" : 40 } { "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 } { "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
In this example, each document has a type field with either cat
or dog
and so it’s quite apparent which document comes from which collection.
But if the documents didn’t have the type field, then it would be harder to work out where one collection finishes and another starts. In this case, we can use a string literal at the $set
stage to represent the collection name.
Example:
db.cats.aggregate( [
{ $set: { _id: "cat" } },
{ $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "dog" } } ] } },
{ $sort: { type: 1, weight: -1, name: 1 } }
] )
Result:
{ "_id" : "cat", "name" : "Meow", "type" : "Cat", "weight" : 7 } { "_id" : "cat", "name" : "Fluffy", "type" : "Cat", "weight" : 5 } { "_id" : "cat", "name" : "Scratch", "type" : "Cat", "weight" : 3 } { "_id" : "dog", "name" : "Fluffy", "type" : "Dog", "weight" : 40 } { "_id" : "dog", "name" : "Wag", "type" : "Dog", "weight" : 20 } { "_id" : "dog", "name" : "Bark", "type" : "Dog", "weight" : 10 }
Sorting Across Collections
In the previous examples, the cats and the dogs were sorted in a way that separated them into two distinct groups; cats first, then dogs. This happened mainly because we sorted on the type
field first.
But we can sort it on any other field, which could result in the cats and dogs being combined.
Example:
db.cats.aggregate( [
{ $set: { _id: "cat" } },
{ $unionWith: { coll: "dogs", pipeline: [ { $set: { _id: "dog" } } ] } },
{ $sort: { name: 1 } }
] )
Result:
{ "_id" : "dog", "name" : "Bark", "type" : "Dog", "weight" : 10 } { "_id" : "cat", "name" : "Fluffy", "type" : "Cat", "weight" : 5 } { "_id" : "dog", "name" : "Fluffy", "type" : "Dog", "weight" : 40 } { "_id" : "cat", "name" : "Meow", "type" : "Cat", "weight" : 7 } { "_id" : "cat", "name" : "Scratch", "type" : "Cat", "weight" : 3 } { "_id" : "dog", "name" : "Wag", "type" : "Dog", "weight" : 20 }
Projections
You can use the $project
stage to specify which fields are to be passed on to the next stage in the pipeline. For example, you can therefore reduce the number of fields returned by the query.
Example:
db.cats.aggregate( [
{ $project: { name: 1, _id: 0 } },
{ $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} }
] )
Result:
{ "name" : "Fluffy" } { "name" : "Scratch" } { "name" : "Meow" } { "name" : "Wag" } { "name" : "Bark" } { "name" : "Fluffy" }
Remove Duplicates
You can use the $group
stage to eliminate redundant duplicates from the result.
For example, the previous query returned two pets called Fluffy. We can add a $group
stage to that query to eliminate the redundant duplicate, so that just one Fluffy is returned.
db.cats.aggregate( [
{ $project: { name: 1, _id: 0 } },
{ $unionWith: { coll: "dogs", pipeline: [ { $project: { name: 1, _id: 0 } } ]} },
{ $group: { _id: "$name" } }
] )
Result:
{ "_id" : "Meow" } { "_id" : "Bark" } { "_id" : "Scratch" } { "_id" : "Wag" } { "_id" : "Fluffy" }
This time, only one Fluffy is returned.
Non-Matching Columns
One of the advantages that MongoDB’s $unionWith
has over SQL’s UNION ALL
is that it can be used with non-matching columns.
The SQL UNION
clause requires that:
- Both queries return the same number of columns
- The columns in the same order
- The matching columns must be of a compatible data type
The MongoDB $unionWith
stage doesn’t impose these limitations.
Therefore, we could use $unionWith
to do something like this:
db.cats.aggregate( [
{ $set: { _id: "$_id" } },
{ $unionWith: { coll: "employees", pipeline: [ { $set: { _id: "$_id" } } ] } },
{ $sort: { type: 1, salary: -1 } }
] )
Result:
{ "_id" : 2, "name" : "Sarah", "salary" : 128000 } { "_id" : 5, "name" : "Beck", "salary" : 82000 } { "_id" : 4, "name" : "Chris", "salary" : 45000 } { "_id" : 3, "name" : "Fritz", "salary" : 25000 } { "_id" : 1, "name" : "Fluffy", "type" : "Cat", "weight" : 5 } { "_id" : 2, "name" : "Scratch", "type" : "Cat", "weight" : 3 } { "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
In this case, we joined the cats
collection with the employees
collection. The employees
collection didn’t have the same fields as the cats
collection, but that’s fine – it still worked.