A Quick Overview of array_agg() in PostgreSQL

PostgreSQL has an aggregate function called array_agg() that allows us to generate an array from the result set of a query.

The function accepts an argument which is used to generate the array. This argument is typically the column that we want to use in order to populate the array.

Example

Suppose we have the following table:

SELECT * FROM Pets;

Result:

 petid | petname | pettype 
-------+---------+---------
1 | Fluffy | Cat
2 | Scratch | Cat
3 | Purr | Cat
4 | Bark | Dog
5 | Wag | Dog
6 | Fetch | Dog

We can use array_agg() to put all those pet names into an array:

SELECT array_agg(PetName) FROM Pets;

Result:

              array_agg               
--------------------------------------
{Fluffy,Scratch,Purr,Bark,Wag,Fetch}

We can filter the result set so that the array contains just a subset of the data:

SELECT array_agg(PetName) 
FROM Pets
WHERE PetType = 'Dog';

Result:

    array_agg     
------------------
{Bark,Wag,Fetch}

The function also accepts its own FILTER clause – more on that below.

The ORDER BY Clause

Aggregate functions in PostgreSQL support the use of an ORDER BY clause. So we can use this to sort the elements within the array:

SELECT array_agg(PetName ORDER BY PetName) FROM Pets;

Result:

              array_agg               
--------------------------------------
{Bark,Fetch,Fluffy,Purr,Scratch,Wag}

The DISTINCT Clause

We can use the DISTINCT clause to include only distinct values in the array:

SELECT array_agg(DISTINCT PetType) FROM Pets;

Result:

 array_agg 
-----------
{Cat,Dog}

The ALL Clause

We can use ALL to specify that all values are returned, even if there are duplicates:

SELECT array_agg(ALL PetType) FROM Pets;

Result:

         array_agg         
---------------------------
{Cat,Cat,Cat,Dog,Dog,Dog}

This is the default behaviour, so we can omit ALL to achieve the same result:

SELECT array_agg(PetType) FROM Pets;

Result:

         array_agg         
---------------------------
{Cat,Cat,Cat,Dog,Dog,Dog}

The FILTER Clause

We can use the FILTER clause to filter the results. In this case, only the rows that evaluate to true are provided to array_agg().

Here’s a basic example:

SELECT array_agg(PetName) FILTER (WHERE PetName LIKE 'F%') FROM Pets;

Result:

   array_agg    
----------------
{Fluffy,Fetch}

Here’s an example that filters without the ORDER BY clause:

\x
SELECT
    array_agg(PetName) AS "Unfiltered",
    array_agg(PetName) FILTER (WHERE i < 5) AS "Filtered"
FROM Pets AS p(i);

Result (using vertical output):

-[ RECORD 1 ]------------------------------------
Unfiltered | {Fluffy,Scratch,Purr,Bark,Wag,Fetch}
Filtered | {Fluffy,Scratch,Purr,Bark}

And here it is with the ORDER BY clause:

SELECT
    array_agg(PetName ORDER BY PetName ASC) AS "Unfiltered Ordered",
    array_agg(PetName ORDER BY PetName ASC) FILTER (WHERE i < 5) AS "Filtered Ordered"
FROM Pets AS p(i);

Result (using vertical output):

Unfiltered Ordered | {Bark,Fetch,Fluffy,Purr,Scratch,Wag}
Filtered Ordered | {Bark,Fluffy,Purr,Scratch}

Creating Multi-Dimensional Arrays

We can’t nest aggregate functions in PostgreSQL. So trying to stuff like the following results in an error:

SELECT array_agg(array_agg(PetName)) FROM Pets;

Output:

ERROR:  aggregate function calls cannot be nested
LINE 1: SELECT array_agg(array_agg(PetName)) FROM Pets;
^

The error message tells us that aggregate function calls cannot be nested.

But we can easily create multi-dimensional arrays with the ARRAY constructor. For example, we could do the following:

SELECT ARRAY[
        ARRAY(SELECT PetName FROM Pets WHERE PetType = 'Dog'),
        ARRAY(SELECT PetName FROM Pets WHERE PetType = 'Cat')
        ];

Output:

                  array                   
------------------------------------------
{{Bark,Wag,Fetch},{Fluffy,Scratch,Purr}}