In SQL, the GROUP BY
clause can be used to divide the results of a query into groups of rows.
This is usually done in order to perform one or more aggregations on each group.
Example 1
Here’s an example to demonstrate the GROUP BY
clause.
Take the following table:
SELECT * FROM Products;
Result:
+-------------+------------+---------------------------------+----------------+-----------------------------------------+ | ProductId | VendorId | ProductName | ProductPrice | ProductDescription | |-------------+------------+---------------------------------+----------------+-----------------------------------------| | 1 | 1001 | Left handed screwdriver | 25.99 | Purple. Includes left handed carry box. | | 2 | 1001 | Long Weight (blue) | 14.75 | Includes a long wait. | | 3 | 1001 | Long Weight (green) | 11.99 | Approximate 30 minute waiting period. | | 4 | 1002 | Sledge Hammer | 33.49 | Wooden handle. Free wine glasses. | | 5 | 1003 | Chainsaw | 245.00 | Orange. Includes spare fingers. | | 6 | 1003 | Straw Dog Box | 55.99 | Tied with vines. Very chewable. | | 7 | 1004 | Bottomless Coffee Mugs (4 Pack) | 9.99 | Brown ceramic with solid handle. | +-------------+------------+---------------------------------+----------------+-----------------------------------------+
We could run the following query against that table.
SELECT
VendorId,
COUNT(VendorId) AS Count
FROM Products
GROUP BY VendorId;
Result:
+------------+---------+ | VendorId | Count | |------------+---------| | 1001 | 3 | | 1002 | 1 | | 1003 | 2 | | 1004 | 1 | +------------+---------+
Here, we use the COUNT()
aggregate function to return the number of rows for each VendorId
, then the GROUP BY
clause to group the results.
Example 2
In this example we use the SUM()
aggregate function to return the aggregate population of all the cities within a district, then the GROUP BY
clause to group the results.
Imagine we have a table called City
that stores city names and their population, as well as their respective country codes and districts (in their own separate columns).
Like this:
SELECT * FROM city
WHERE CountryCode IN ('AGO', 'ARE', 'AUS');
Result:
+------+---------------+---------------+-----------------+--------------+ | ID | Name | CountryCode | District | Population | |------+---------------+---------------+-----------------+--------------| | 56 | Luanda | AGO | Luanda | 2022000 | | 57 | Huambo | AGO | Huambo | 163100 | | 58 | Lobito | AGO | Benguela | 130000 | | 59 | Benguela | AGO | Benguela | 128300 | | 60 | Namibe | AGO | Namibe | 118200 | | 64 | Dubai | ARE | Dubai | 669181 | | 65 | Abu Dhabi | ARE | Abu Dhabi | 398695 | | 66 | Sharja | ARE | Sharja | 320095 | | 67 | al-Ayn | ARE | Abu Dhabi | 225970 | | 68 | Ajman | ARE | Ajman | 114395 | | 130 | Sydney | AUS | New South Wales | 3276207 | | 131 | Melbourne | AUS | Victoria | 2865329 | | 132 | Brisbane | AUS | Queensland | 1291117 | | 133 | Perth | AUS | West Australia | 1096829 | | 134 | Adelaide | AUS | South Australia | 978100 | | 135 | Canberra | AUS | Capital Region | 322723 | | 136 | Gold Coast | AUS | Queensland | 311932 | | 137 | Newcastle | AUS | New South Wales | 270324 | | 138 | Central Coast | AUS | New South Wales | 227657 | | 139 | Wollongong | AUS | New South Wales | 219761 | | 140 | Hobart | AUS | Tasmania | 126118 | | 141 | Geelong | AUS | Victoria | 125382 | | 142 | Townsville | AUS | Queensland | 109914 | | 143 | Cairns | AUS | Queensland | 92273 | +------+---------------+---------------+-----------------+--------------+
I reduced the results to just three countries, otherwise the list would be way too long for this article.
Now, suppose we wanted to get the population of each district, and we wanted to list each district, along with its population and country code.
We could do this.
SELECT
CountryCode,
District,
SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode, District
ORDER BY CountryCode;
Result:
+---------------+-----------------+--------------+ | CountryCode | District | Population | |---------------+-----------------+--------------| | AGO | Benguela | 258300 | | AGO | Huambo | 163100 | | AGO | Luanda | 2022000 | | AGO | Namibe | 118200 | | ARE | Abu Dhabi | 624665 | | ARE | Ajman | 114395 | | ARE | Dubai | 669181 | | ARE | Sharja | 320095 | | AUS | Capital Region | 322723 | | AUS | New South Wales | 3993949 | | AUS | Queensland | 1805236 | | AUS | South Australia | 978100 | | AUS | Tasmania | 126118 | | AUS | Victoria | 2990711 | | AUS | West Australia | 1096829 | +---------------+-----------------+--------------+
We can see that our results are grouped as specified, and we now get the full population for each district (as opposed to the population of the individual cities, which is how they’re stored in the underlying table).
Note that the GROUP BY
clause must come after any WHERE
clause and before any ORDER BY
clause.
If we wanted to get the population of each country instead of the district, our query becomes even more compact.
SELECT
CountryCode,
SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode
ORDER BY CountryCode;
Result:
+---------------+--------------+ | CountryCode | Population | |---------------+--------------| | AGO | 2561600 | | ARE | 1728336 | | AUS | 11313666 | +---------------+--------------+
Bear in mind that this particular sample database is very out of date, and its population numbers don’t reflect current reality.
Example 3 – The HAVING Clause
You can include the HAVING
clause with your GROUP BY
clause to filter the groups.
Example:
SELECT
CountryCode,
District,
SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode, District
HAVING SUM(Population) > 1000000
ORDER BY CountryCode;
Result:
+---------------+-----------------+--------------+ | CountryCode | District | Population | |---------------+-----------------+--------------| | AGO | Luanda | 2022000 | | AUS | New South Wales | 3993949 | | AUS | Queensland | 1805236 | | AUS | Victoria | 2990711 | | AUS | West Australia | 1096829 | +---------------+-----------------+--------------+
The HAVING
clause is similar to the WHERE
clause, except that WHERE
filters individual rows, whereas HAVING
filters groups.
Also, the WHERE
clause filters data before it is grouped, whereas HAVING
filters data after it is grouped.
The HAVING
clause accepts the same operators that you can use with the WHERE
clause (such as =
, >
, >=
, IN
, LIKE
, etc).