MongoDB $stdDevSamp

In MongoDB, the $stdDevSamp aggregation pipeline operator calculates the sample standard deviation of the input values.

The input values can be from a group of documents (i.e. documents that are grouped by the same key), or they can be multiple fields within a single document.

$stdDevSamp is similar to $stdDevPop. The difference is that $stdDevSamp calculates the sample standard deviation, whereas $stdDevPop calculates the population standard deviation.

Therefore, use $stdDevSamp if your values encompass a sample of a population of data from which to generalise about the population. If the values represent the entire population of data or you do not wish to generalise about a larger population, use $stdDevPop instead.

Syntax

The $stdDevSamp operator supports two syntaxes.

Syntax 1:

{ $stdDevSamp: <expression> }

Syntax 2:

{ $stdDevSamp: [ <expression1>, <expression2> ... ]  }

The first syntax accepts one argument and the second syntax accepts multiple arguments.

When used in the $group stage, you can only use the first syntax. In this case, $stdDevSamp returns the sample standard deviation of the specified expression for a group of documents that share the same group by key.

Examples of Syntax 1 (Single Argument)

Here are a couple of examples that use syntax 1.

Grouped Documents

This example uses $stdDevSamp in conjunction with $group to return the sample standard deviation across a group of documents that are grouped by key.

Suppose we have a collection called stonks with the following documents:

{ "_id" : 1, "ticker" : "gme", "price" : 10 }
{ "_id" : 2, "ticker" : "gme", "price" : 40 }
{ "_id" : 3, "ticker" : "gme", "price" : 90 }
{ "_id" : 4, "ticker" : "gme", "price" : 180 }
{ "_id" : 5, "ticker" : "gme", "price" : 290 }
{ "_id" : 6, "ticker" : "gme", "price" : 390 }
{ "_id" : 7, "ticker" : "gme", "price" : 190 }
{ "_id" : 8, "ticker" : "gme", "price" : 90 }
{ "_id" : 9, "ticker" : "gme", "price" : 10 }
{ "_id" : 10, "ticker" : "jnj", "price" : 131 }
{ "_id" : 11, "ticker" : "jnj", "price" : 133 }
{ "_id" : 12, "ticker" : "jnj", "price" : 138 }
{ "_id" : 13, "ticker" : "jnj", "price" : 141 }
{ "_id" : 14, "ticker" : "jnj", "price" : 145 }
{ "_id" : 15, "ticker" : "jnj", "price" : 150 }
{ "_id" : 16, "ticker" : "jnj", "price" : 154 }
{ "_id" : 17, "ticker" : "jnj", "price" : 156 }
{ "_id" : 18, "ticker" : "jnj", "price" : 160 }

We can group these documents by their ticker field, and then use $stdDevSamp to return the sample standard deviation of the price field for each group:

db.stonks.aggregate(
   [
     {
       $group:
          {
            _id: "$ticker",
            result: { $stdDevSamp: "$price" }
          }
     }
   ]
)

Result:

{ "_id" : "gme", "result" : 131.24404748406687 }
{ "_id" : "jnj", "result" : 10.344080432788612 }

We can see that gme has a much higher sample standard deviation than jnj.

Arrays

This example applies $stdDevSamp to a single document that contains a field with an array of values.

This option is only available when using the single argument syntax. Arrays are ignored when using the multi-argument syntax (more on this below).

Suppose we have a collection called players with the following documents:

{ "_id" : 1, "player" : "Homer", "scores" : [ 1, 7, 2, 3, 8, 7, 1 ] }
{ "_id" : 2, "player" : "Marge", "scores" : [ 0, 1, 8, 17, 18, 8 ] }
{ "_id" : 3, "player" : "Bart", "scores" : [ 15, 11, 8, 0, 1, 3 ] }
{ "_id" : 4, "player" : "Brian", "scores" : [ 7 ] }
{ "_id" : 5, "player" : "Farnsworth", "scores" : [ ] }
{ "_id" : 6, "player" : "Meg", "scores" : null }

We can apply $stdDevSamp to the scores field in each document:

db.players.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevSamp: "$scores" }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : 3.0783421635988546 }
{ "_id" : 2, "result" : 7.633260552782583 }
{ "_id" : 3, "result" : 5.988878581726855 }
{ "_id" : 4, "result" : null }
{ "_id" : 5, "result" : null }
{ "_id" : 6, "result" : null }

In this case, the first three documents returned the sample standard deviation for the various numbers that were in their respective arrays.

Document 4 resulted a standard deviation of null. This is because we only provided one number in the array. If we’d used $stdDevPop, this would have returned 0.

Document 5 returned null because we provided an empty array.

Document 6 returned null because we provided null as the argument.

Example of Syntax 2 (Multiple Arguments)

The second syntax involves providing $stdDevSamp with more than one argument. $stdDevSamp then calculates the standard deviation based on all of the supplied arguments.

Suppose we have a collection called data with the following documents:

{ "_id" : 1, "a" : 1, "b" : 2, "c" : 3, "d" : 4 }
{ "_id" : 2, "a" : 1, "b" : 2, "c" : 3, "d" : [ 4 ] }
{ "_id" : 3, "a" : 1, "b" : 2, "c" : 3, "d" : "Hey" }

We can use $stdDevSamp to return the sample standard deviation of the a, b, c, and d fields of each document:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevSamp: [ "$a", "$b", "$c", "$d" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : 1.2909944487358056 }
{ "_id" : 2, "result" : 1 }
{ "_id" : 3, "result" : 1 }

The first document’s result is based on the input values of 1, 2, 3, and 4.

However, the last two documents resulted in only 1, 2, and 3 being evaluated. The $stdDevSamp operator ignored their d fields.

The $stdDevSamp ignores non-numeric values. So in this case it ignored "Hey" in document 3 and calculated the sample standard deviation from the remaining (numeric) fields.

As for document 2, its d field contains an array. As mentioned, the $stdDevSamp operator ignores arrays when using the multi-argument syntax. More precisely, it treats arrays as non-numerical values when used in this context. And as mentioned, $stdDevSamp ignores non-numerical values.

If all values are non-numeric, then $stdDevSamp returns null.

Missing Fields

When using the multi-argument syntax, $stdDevSamp ignores any missing fields. That is, if you supply a field that doesn’t exist, it ignores it. If none of the fields exist, then it returns null.

Example:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevSamp: [ "$a", "$b", "$c", "$d", "$e" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : 1.2909944487358056 }
{ "_id" : 2, "result" : 1 }
{ "_id" : 3, "result" : 1 }

In this case I provided an extra field ($e) that doesn’t exist in the document. $stdDevSamp calculated the sample standard deviation based on the remaining fields that do exist.

However, here’s what happens when none of the fields exist:

db.data.aggregate(
   [
     {
       $project:
          {
            result: { $stdDevSamp: [ "$x", "$y", "$z" ] }
          }
     }
   ]
)

Result:

{ "_id" : 1, "result" : null }
{ "_id" : 2, "result" : null }
{ "_id" : 3, "result" : null }

The result is null for all documents.

When using the single-argument syntax, a missing field results in null.

Example:

db.stonks.aggregate(
   [
     {
       $group:
          {
            _id: "$ticker",
            result: { $stdDevSamp: "$oops!" }
          }
     }
   ]
)

Result:

{ "_id" : "gme", "result" : null }
{ "_id" : "jnj", "result" : null }

Available Stages

$stdDevSamp is available in the in the following stages:

  • $group
  • $project
  • $addFields
  • $set
  • $replaceRoot
  • $replaceWith
  • $match stage that includes an $expr expression