MongoDB $stdDevPop

In MongoDB, the $stdDevPop aggregation pipeline operator calculates the population standard deviation of its input values. 

The input values can be from a group of documents (i.e. documents that are grouped by the same key), or they can be multiple fields within a single document.

Syntax

The $stdDevPop operator supports two syntaxes.

Syntax 1:

{ $stdDevPop: <expression> }

Syntax 2:

{ $stdDevPop: [ <expression1>, <expression2> ... ]  }

The first syntax accepts one argument and the second syntax accepts multiple arguments.

When used in the $group stage, you can only use the first syntax. In this case, $stdDevPop returns the population standard deviation of the specified expression for a group of documents that share the same group by key.

Examples of Syntax 1 (Single Argument)

Here are a couple of examples that use syntax 1.

Grouped Documents

This example uses $stdDevPop in conjunction with $group to return the standard deviation across a group of documents that are grouped by key.

Suppose we have a collection called stonks with the following documents:

{ "_id" : 1, "ticker" : "gme", "price" : 10 }
{ "_id" : 2, "ticker" : "gme", "price" : 40 }
{ "_id" : 3, "ticker" : "gme", "price" : 90 }
{ "_id" : 4, "ticker" : "gme", "price" : 180 }
{ "_id" : 5, "ticker" : "gme", "price" : 290 }
{ "_id" : 6, "ticker" : "gme", "price" : 390 }
{ "_id" : 7, "ticker" : "gme", "price" : 190 }
{ "_id" : 8, "ticker" : "gme", "price" : 90 }
{ "_id" : 9, "ticker" : "gme", "price" : 10 }
{ "_id" : 10, "ticker" : "jnj", "price" : 131 }
{ "_id" : 11, "ticker" : "jnj", "price" : 133 }
{ "_id" : 12, "ticker" : "jnj", "price" : 138 }
{ "_id" : 13, "ticker" : "jnj", "price" : 141 }
{ "_id" : 14, "ticker" : "jnj", "price" : 145 }
{ "_id" : 15, "ticker" : "jnj", "price" : 150 }
{ "_id" : 16, "ticker" : "jnj", "price" : 154 }
{ "_id" : 17, "ticker" : "jnj", "price" : 156 }
{ "_id" : 18, "ticker" : "jnj", "price" : 160 }

We can group these documents by their ticker field, and then use $stdDevPop to return the population standard deviation of the price field for each group:

db.stonks.aggregate(
   [
     {
       $group:
          {
            _id: "$ticker",
            standardDeviation: { $stdDevPop: "$price" }
          }
     }
   ]
)

Result:

{ "_id" : "gme", "standardDeviation" : 123.7380746218039 }
{ "_id" : "jnj", "standardDeviation" : 9.752492558885207 }

We can see that gme has a much higher standard deviation than jnj.

Arrays

This example applies $stdDevPop to a single document that contains a field with an array of values.

This option is only available when using the single argument syntax. Arrays are ignored when using the multi-argument syntax (more on this below).

Suppose we have a collection called players with the following documents:

{ "_id" : 1, "player" : "Homer", "scores" : [ 1, 7, 2, 3, 8, 7, 1 ] }
{ "_id" : 2, "player" : "Marge", "scores" : [ 0, 1, 8, 17, 18, 8 ] }
{ "_id" : 3, "player" : "Bart", "scores" : [ 15, 11, 8, 0, 1, 3 ] }
{ "_id" : 4, "player" : "Brian", "scores" : [ 7 ] }
{ "_id" : 5, "player" : "Farnsworth", "scores" : [ ] }
{ "_id" : 6, "player" : "Meg", "scores" : null }

We can apply $stdDevPop to the scores field in each document:

db.players.aggregate(
   [
     {
       $project:
          {
            standardDeviation: { $stdDevPop: "$scores" }
          }
     }
   ]
)

Result:

{ "_id" : 1, "standardDeviation" : 2.849991049037143 }
{ "_id" : 2, "standardDeviation" : 6.968181653455625 }
{ "_id" : 3, "standardDeviation" : 5.467073155618908 }
{ "_id" : 4, "standardDeviation" : 0 }
{ "_id" : 5, "standardDeviation" : null }
{ "_id" : 6, "standardDeviation" : null }

In this case, the first three documents returned the standard deviation for the various numbers that were in their respective arrays.

Document 4 resulted a standard deviation of 0. This is because we only provided one number in the array.

Document 5 returned null because we provided an empty array.

Document 6 returned null because we provided null as the argument.

Example of Syntax 2 (Multiple Arguments)

The second syntax involves providing $stdDevPop with more than one argument. $stdDevPop then calculates the standard deviation based on all of the supplied arguments.

Suppose we have a collection called data with the following documents:

{ "_id" : 1, "a" : 1, "b" : 2, "c" : 3, "d" : 4 }
{ "_id" : 2, "a" : 1, "b" : 2, "c" : 3, "d" : [ 4 ] }
{ "_id" : 3, "a" : 1, "b" : 2, "c" : 3, "d" : "Hey" }

We can use $stdDevPop to return the population standard deviation of the a, b, c, and d fields of each document:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevPop: [ "$a", "$b", "$c", "$d" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : 1.118033988749895 }
{ "_id" : 2, "result" : 0.816496580927726 }
{ "_id" : 3, "result" : 0.816496580927726 }

Document 1 returns the standard deviation based on its input values of 1, 2, 3, and 4.

However, the last two documents only returned the standard deviation for input values of 1, 2, and 3. The $stdDevPop operator ignored their d fields.

Why is this?

The way it works is that $stdDevPop ignores non-numeric values. So in this case it ignored "Hey" in document 3 and calculated the population standard deviation from the remaining (numeric) fields.

As for document 2, its d field contains an array. As mentioned, the $stdDevPop operator ignores arrays when using the multi-argument syntax. More precisely, it treats arrays as non-numerical values when used in this context. And as mentioned, $stdDevPop ignores non-numerical values.

If all values are non-numeric, then $stdDevPop returns null.

Missing Fields

When using the multi-argument syntax, $stdDevPop ignores any missing fields. That is, if you supply a field that doesn’t exist, it ignores it. If none of the fields exist, then it returns null.

Example:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevPop: [ "$a", "$b", "$c", "$d", "$e" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : 1.118033988749895 }
{ "_id" : 2, "result" : 0.816496580927726 }
{ "_id" : 3, "result" : 0.816496580927726 }

In this case I provided an extra field ($e) that doesn’t exist in the document. $stdDevPop calculated the standard deviation based on the remaining fields that do exist.

However, here’s what happens when none of the fields exist:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevPop: [ "$x", "$y", "$z" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : null }
{ "_id" : 2, "result" : null }
{ "_id" : 3, "result" : null }

The result is null for all documents.

When using the single-argument syntax, a missing field results in null.

Example:

db.stonks.aggregate(
   [
     {
       $group:
          {
            _id: "$ticker",
            standardDeviation: { $stdDevPop: "$oops!" }
          }
     }
   ]
)

Result:

{ "_id" : "gme", "standardDeviation" : null }
{ "_id" : "jnj", "standardDeviation" : null }

Available Stages

$stdDevPop is available in the in the following stages:

  • $group
  • $project
  • $addFields
  • $set
  • $replaceRoot
  • $replaceWith
  • $match stage that includes an $expr expression

Calculate the Sample Standard Deviation

See MongoDB $stdDevSamp if you need to get the sample standard deviation, as opposed to the population standard deviation. This operator is useful if your values encompass a sample of a population of data from which to generalise about the population.