PostgreSQL has an aggregate function called array_agg()
that allows us to generate an array from the result set of a query.
The function accepts an argument which is used to generate the array. This argument is typically the column that we want to use in order to populate the array.
Example
Suppose we have the following table:
SELECT * FROM Pets;
Result:
petid | petname | pettype
-------+---------+---------
1 | Fluffy | Cat
2 | Scratch | Cat
3 | Purr | Cat
4 | Bark | Dog
5 | Wag | Dog
6 | Fetch | Dog
We can use array_agg()
to put all those pet names into an array:
SELECT array_agg(PetName) FROM Pets;
Result:
array_agg
--------------------------------------
{Fluffy,Scratch,Purr,Bark,Wag,Fetch}
We can filter the result set so that the array contains just a subset of the data:
SELECT array_agg(PetName)
FROM Pets
WHERE PetType = 'Dog';
Result:
array_agg
------------------
{Bark,Wag,Fetch}
The function also accepts its own FILTER
clause – more on that below.
The ORDER BY
Clause
Aggregate functions in PostgreSQL support the use of an ORDER BY
clause. So we can use this to sort the elements within the array:
SELECT array_agg(PetName ORDER BY PetName) FROM Pets;
Result:
array_agg
--------------------------------------
{Bark,Fetch,Fluffy,Purr,Scratch,Wag}
The DISTINCT
Clause
We can use the DISTINCT
clause to include only distinct values in the array:
SELECT array_agg(DISTINCT PetType) FROM Pets;
Result:
array_agg
-----------
{Cat,Dog}
The ALL
Clause
We can use ALL
to specify that all values are returned, even if there are duplicates:
SELECT array_agg(ALL PetType) FROM Pets;
Result:
array_agg
---------------------------
{Cat,Cat,Cat,Dog,Dog,Dog}
This is the default behaviour, so we can omit ALL
to achieve the same result:
SELECT array_agg(PetType) FROM Pets;
Result:
array_agg
---------------------------
{Cat,Cat,Cat,Dog,Dog,Dog}
The FILTER
Clause
We can use the FILTER
clause to filter the results. In this case, only the rows that evaluate to true are provided to array_agg()
.
Here’s a basic example:
SELECT array_agg(PetName) FILTER (WHERE PetName LIKE 'F%') FROM Pets;
Result:
array_agg
----------------
{Fluffy,Fetch}
Here’s an example that filters without the ORDER BY
clause:
\x
SELECT
array_agg(PetName) AS "Unfiltered",
array_agg(PetName) FILTER (WHERE i < 5) AS "Filtered"
FROM Pets AS p(i);
Result (using vertical output):
-[ RECORD 1 ]------------------------------------
Unfiltered | {Fluffy,Scratch,Purr,Bark,Wag,Fetch}
Filtered | {Fluffy,Scratch,Purr,Bark}
And here it is with the ORDER BY
clause:
SELECT
array_agg(PetName ORDER BY PetName ASC) AS "Unfiltered Ordered",
array_agg(PetName ORDER BY PetName ASC) FILTER (WHERE i < 5) AS "Filtered Ordered"
FROM Pets AS p(i);
Result (using vertical output):
Unfiltered Ordered | {Bark,Fetch,Fluffy,Purr,Scratch,Wag}
Filtered Ordered | {Bark,Fluffy,Purr,Scratch}
Creating Multi-Dimensional Arrays
We can’t nest aggregate functions in PostgreSQL. So trying to stuff like the following results in an error:
SELECT array_agg(array_agg(PetName)) FROM Pets;
Output:
ERROR: aggregate function calls cannot be nested
LINE 1: SELECT array_agg(array_agg(PetName)) FROM Pets;
^
The error message tells us that aggregate function calls cannot be nested.
But we can easily create multi-dimensional arrays with the ARRAY
constructor. For example, we could do the following:
SELECT ARRAY[
ARRAY(SELECT PetName FROM Pets WHERE PetType = 'Dog'),
ARRAY(SELECT PetName FROM Pets WHERE PetType = 'Cat')
];
Output:
array
------------------------------------------
{{Bark,Wag,Fetch},{Fluffy,Scratch,Purr}}