An Introduction to the array_sample() Function in PostgreSQL

In PostgreSQL array_sample() is a system function that returns an array of a specified number of items randomly selected from the given array.

The array_sample() function was introduced in PostgreSQL 16, which was released on September 14th 2023.

Syntax

The syntax goes like this:

array_sample ( array anyarray, n integer )

So we must pass an array to the function whenever we use it, and we must specify how large the sample should be. The array can be an array of any type (hence the anyarray part of the syntax).

Example

Here’s an example to demonstrate:

SELECT array_sample( ARRAY[1,2,3,4,5,6,7,8,9,10], 3 );

Result:

 array_sample 
--------------
{2,10,6}

In this example I specified that three elements should be randomly selected from the array provided at the first argument.

Let’s do another one:

SELECT array_sample( ARRAY[ 'Cat', 'Dog', 'Bird', 'Fox' ], 2 );

Result:

 array_sample 
--------------
{Fox,Cat}

Multi-dimensional Arrays

If the array is a multi-dimensional array, array_sample() applies to the first dimension. So our resulting array will itself be multi-dimensional, consisting of arrays chosen randomly from the source array:

SELECT array_sample( ARRAY[ [1,2,3],[4,5,6],[7,8,9],[10,11,12] ], 2 );

Result:

   array_sample    
-------------------
{{4,5,6},{7,8,9}}

Returning an Empty Array

If we provide a sample size of zero, an empty array is returned:

SELECT array_sample( ARRAY[ 'Cat', 'Dog', 'Bird', 'Fox' ], 0 );

Result:

 array_sample 
--------------
{}

Invalid Sample Size

The sample size can’t be larger than the array itself:

SELECT array_sample( ARRAY[ 'Cat', 'Dog', 'Bird', 'Fox' ], 6 );

Result:

ERROR:  sample size must be between 0 and 4

Here the error message tells me that the sample size must be between zero and four. I got this error message because I passed 6 as the second argument when there are only four elements in the array.

We get the same error when passing a negative value for the sample size:

SELECT array_sample( ARRAY[ 'Cat', 'Dog', 'Bird', 'Fox' ], -1 );

Result:

ERROR:  sample size must be between 0 and 4

Wrong Data Type

An error is returned when the first argument is not an array. The error may depend on the argument.

For example:

SELECT array_sample([ 'Cat', 'Dog', 'Bird', 'Fox' ], 2 );

Result:

ERROR:  syntax error at or near "["
LINE 1: SELECT array_sample([ 'Cat', 'Dog', 'Bird', 'Fox' ], 2 );
^

Here I omitted the ARRAY keyword for declaring the array.

Here’s another example that produces an error:

SELECT array_sample('Cat', 2 );

Result:

ERROR:  could not determine polymorphic type because input has type unknown

And:

SELECT array_sample( "Cat", 2 );

Result:

ERROR:  column "Cat" does not exist
LINE 1: SELECT array_sample( "Cat", 2 );
^

And another:

SELECT array_sample( 1, 2 );

Result:

ERROR:  function array_sample(integer, integer) does not exist
LINE 1: SELECT array_sample( 1, 2 );
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.