3 Ways to Return a Random Sample of Documents from a MongoDB Collection

If you need to return a small sample of random documents from a collection, here are three approaches you can try using the aggregation pipeline.

The $sample Stage

The $sample aggregation pipeline stage is designed specifically for randomly selecting a specified number of documents.

When you use $sample, you specify the number of documents you’d like to returned in a size field.

Suppose we have the following collection called pets:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

We can use $sample to grab a random sample of those documents like this:

db.pets.aggregate(
   [
      { 
        $sample: { size: 3 } 
      }
   ]
)

Result:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }

In this case I specified { size: 3 } which returned three documents.

Here it is again using a different sample size:

db.pets.aggregate(
   [
      { 
        $sample: { size: 5 } 
      }
   ]
)

Result:

{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }

The $sample stage works in one of two ways, depending on how many documents are in the collection, the sample size relative to the number of documents in the collection, and its position in the pipeline. See MongoDB $sample for an explanation of how it works.

It’s also possible that the $sample stage could return the same document more than once in its result set.

The $rand Operator

The $rand operator was introduced in MongoDB 4.4.2, and its purpose is to return a random float between 0 and 1 each time it’s called.

Therefore, we can use it in the $match stage in conjunction with other operators, such as $expr and $lt to return a random sample of documents.

Example:

db.pets.aggregate(
   [
      { 
        $match: 
          { 
            $expr: 
              { 
                $lt: [ 0.5, { $rand: {} } ] 
              }
          } 
      }
   ]
)

Result:

{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

The result set from this approach is different to the $sample approach, in that it doesn’t return a fixed number of documents. The number of documents returned with this approach can vary.

For example, here’s what happens when I run the same code several more times.

Result set 2:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

Result set 3:

{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

Result set 4:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

Result set 5:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

The $sampleRate Operator

Introduced in MongoDB 4.4.2, the $sampleRate operator provides a more concise way of doing the same as the previous example.

When you use $sampleRate, you provide a sample rate as a floating point number between 0 and 1. The selection process uses a uniform random distribution, and the sample rate you provide represents the probability that a given document will be selected as it passes through the pipeline.

Example:

db.pets.aggregate(
   [
      { 
        $match: { $sampleRate: 0.5 } 
      }
   ]
)

Result:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

And run it again:

{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

And again:

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }