In MongoDB, the $stdDevPop
aggregation pipeline operator calculates the population standard deviation of its input values.
The input values can be from a group of documents (i.e. documents that are grouped by the same key), or they can be multiple fields within a single document.
Syntax
The $stdDevPop
operator supports two syntaxes.
Syntax 1:
{ $stdDevPop: <expression> }
Syntax 2:
{ $stdDevPop: [ <expression1>, <expression2> ... ] }
The first syntax accepts one argument and the second syntax accepts multiple arguments.
When used in the $group
stage, you can only use the first syntax. In this case, $stdDevPop
returns the population standard deviation of the specified expression for a group of documents that share the same group by key.
Examples of Syntax 1 (Single Argument)
Here are a couple of examples that use syntax 1.
Grouped Documents
This example uses $stdDevPop
in conjunction with $group
to return the standard deviation across a group of documents that are grouped by key.
Suppose we have a collection called stonks
with the following documents:
{ "_id" : 1, "ticker" : "gme", "price" : 10 } { "_id" : 2, "ticker" : "gme", "price" : 40 } { "_id" : 3, "ticker" : "gme", "price" : 90 } { "_id" : 4, "ticker" : "gme", "price" : 180 } { "_id" : 5, "ticker" : "gme", "price" : 290 } { "_id" : 6, "ticker" : "gme", "price" : 390 } { "_id" : 7, "ticker" : "gme", "price" : 190 } { "_id" : 8, "ticker" : "gme", "price" : 90 } { "_id" : 9, "ticker" : "gme", "price" : 10 } { "_id" : 10, "ticker" : "jnj", "price" : 131 } { "_id" : 11, "ticker" : "jnj", "price" : 133 } { "_id" : 12, "ticker" : "jnj", "price" : 138 } { "_id" : 13, "ticker" : "jnj", "price" : 141 } { "_id" : 14, "ticker" : "jnj", "price" : 145 } { "_id" : 15, "ticker" : "jnj", "price" : 150 } { "_id" : 16, "ticker" : "jnj", "price" : 154 } { "_id" : 17, "ticker" : "jnj", "price" : 156 } { "_id" : 18, "ticker" : "jnj", "price" : 160 }
We can group these documents by their ticker
field, and then use $stdDevPop
to return the population standard deviation of the price
field for each group:
db.stonks.aggregate(
[
{
$group:
{
_id: "$ticker",
standardDeviation: { $stdDevPop: "$price" }
}
}
]
)
Result:
{ "_id" : "gme", "standardDeviation" : 123.7380746218039 } { "_id" : "jnj", "standardDeviation" : 9.752492558885207 }
We can see that gme
has a much higher standard deviation than jnj
.
Arrays
This example applies $stdDevPop
to a single document that contains a field with an array of values.
This option is only available when using the single argument syntax. Arrays are ignored when using the multi-argument syntax (more on this below).
Suppose we have a collection called players
with the following documents:
{ "_id" : 1, "player" : "Homer", "scores" : [ 1, 7, 2, 3, 8, 7, 1 ] } { "_id" : 2, "player" : "Marge", "scores" : [ 0, 1, 8, 17, 18, 8 ] } { "_id" : 3, "player" : "Bart", "scores" : [ 15, 11, 8, 0, 1, 3 ] } { "_id" : 4, "player" : "Brian", "scores" : [ 7 ] } { "_id" : 5, "player" : "Farnsworth", "scores" : [ ] } { "_id" : 6, "player" : "Meg", "scores" : null }
We can apply $stdDevPop
to the scores
field in each document:
db.players.aggregate(
[
{
$project:
{
standardDeviation: { $stdDevPop: "$scores" }
}
}
]
)
Result:
{ "_id" : 1, "standardDeviation" : 2.849991049037143 } { "_id" : 2, "standardDeviation" : 6.968181653455625 } { "_id" : 3, "standardDeviation" : 5.467073155618908 } { "_id" : 4, "standardDeviation" : 0 } { "_id" : 5, "standardDeviation" : null } { "_id" : 6, "standardDeviation" : null }
In this case, the first three documents returned the standard deviation for the various numbers that were in their respective arrays.
Document 4 resulted a standard deviation of 0
. This is because we only provided one number in the array.
Document 5 returned null
because we provided an empty array.
Document 6 returned null
because we provided null
as the argument.
Example of Syntax 2 (Multiple Arguments)
The second syntax involves providing $stdDevPop
with more than one argument. $stdDevPop
then calculates the standard deviation based on all of the supplied arguments.
Suppose we have a collection called data
with the following documents:
{ "_id" : 1, "a" : 1, "b" : 2, "c" : 3, "d" : 4 } { "_id" : 2, "a" : 1, "b" : 2, "c" : 3, "d" : [ 4 ] } { "_id" : 3, "a" : 1, "b" : 2, "c" : 3, "d" : "Hey" }
We can use $stdDevPop
to return the population standard deviation of the a
, b
, c
, and d
fields of each document:
db.data.aggregate(
[
{
$project:
{
result: { $stdDevPop: [ "$a", "$b", "$c", "$d" ] }
}
}
]
)
Result:
{ "_id" : 1, "result" : 1.118033988749895 } { "_id" : 2, "result" : 0.816496580927726 } { "_id" : 3, "result" : 0.816496580927726 }
Document 1 returns the standard deviation based on its input values of 1
, 2
, 3
, and 4
.
However, the last two documents only returned the standard deviation for input values of 1
, 2
, and 3
. The $stdDevPop
operator ignored their d
fields.
Why is this?
The way it works is that $stdDevPop
ignores non-numeric values. So in this case it ignored "Hey"
in document 3 and calculated the population standard deviation from the remaining (numeric) fields.
As for document 2, its d
field contains an array. As mentioned, the $stdDevPop
operator ignores arrays when using the multi-argument syntax. More precisely, it treats arrays as non-numerical values when used in this context. And as mentioned, $stdDevPop
ignores non-numerical values.
If all values are non-numeric, then $stdDevPop
returns null
.
Missing Fields
When using the multi-argument syntax, $stdDevPop
ignores any missing fields. That is, if you supply a field that doesn’t exist, it ignores it. If none of the fields exist, then it returns null
.
Example:
db.data.aggregate(
[
{
$project:
{
result: { $stdDevPop: [ "$a", "$b", "$c", "$d", "$e" ] }
}
}
]
)
Result:
{ "_id" : 1, "result" : 1.118033988749895 } { "_id" : 2, "result" : 0.816496580927726 } { "_id" : 3, "result" : 0.816496580927726 }
In this case I provided an extra field ($e
) that doesn’t exist in the document. $stdDevPop
calculated the standard deviation based on the remaining fields that do exist.
However, here’s what happens when none of the fields exist:
db.data.aggregate(
[
{
$project:
{
result: { $stdDevPop: [ "$x", "$y", "$z" ] }
}
}
]
)
Result:
{ "_id" : 1, "result" : null } { "_id" : 2, "result" : null } { "_id" : 3, "result" : null }
The result is null
for all documents.
When using the single-argument syntax, a missing field results in null
.
Example:
db.stonks.aggregate(
[
{
$group:
{
_id: "$ticker",
standardDeviation: { $stdDevPop: "$oops!" }
}
}
]
)
Result:
{ "_id" : "gme", "standardDeviation" : null } { "_id" : "jnj", "standardDeviation" : null }
Available Stages
$stdDevPop
is available in the in the following stages:
$group
$project
$addFields
$set
$replaceRoot
$replaceWith
$match
stage that includes an$expr
expression
Calculate the Sample Standard Deviation
See MongoDB $stdDevSamp
if you need to get the sample standard deviation, as opposed to the population standard deviation. This operator is useful if your values encompass a sample of a population of data from which to generalise about the population.