SQL GROUP BY Clause for Beginners

In SQL, the GROUP BY clause can be used to divide the results of a query into groups of rows.

This is usually done in order to perform one or more aggregations on each group.

Example 1

Here’s an example to demonstrate the GROUP BY clause.

Take the following table:

SELECT * FROM Products;

Result:

+-------------+------------+---------------------------------+----------------+-----------------------------------------+
| ProductId   | VendorId   | ProductName                     | ProductPrice   | ProductDescription                      |
|-------------+------------+---------------------------------+----------------+-----------------------------------------|
| 1           | 1001       | Left handed screwdriver         | 25.99          | Purple. Includes left handed carry box. |
| 2           | 1001       | Long Weight (blue)              | 14.75          | Includes a long wait.                   |
| 3           | 1001       | Long Weight (green)             | 11.99          | Approximate 30 minute waiting period.   |
| 4           | 1002       | Sledge Hammer                   | 33.49          | Wooden handle. Free wine glasses.       |
| 5           | 1003       | Chainsaw                        | 245.00         | Orange. Includes spare fingers.         |
| 6           | 1003       | Straw Dog Box                   | 55.99          | Tied with vines. Very chewable.         |
| 7           | 1004       | Bottomless Coffee Mugs (4 Pack) | 9.99           | Brown ceramic with solid handle.        |
+-------------+------------+---------------------------------+----------------+-----------------------------------------+

We could run the following query against that table.

SELECT 
    VendorId,
    COUNT(VendorId) AS Count
FROM Products
GROUP BY VendorId;

Result:

+------------+---------+
| VendorId   | Count   |
|------------+---------|
| 1001       | 3       |
| 1002       | 1       |
| 1003       | 2       |
| 1004       | 1       |
+------------+---------+

Here, we use the COUNT() aggregate function to return the number of rows for each VendorId, then the GROUP BY clause to group the results.

Example 2

In this example we use the SUM() aggregate function to return the aggregate population of all the cities within a district, then the GROUP BY clause to group the results.

Imagine we have a table called City that stores city names and their population, as well as their respective country codes and districts (in their own separate columns).

Like this:

SELECT * FROM city
WHERE CountryCode IN ('AGO', 'ARE', 'AUS');

Result:

+------+---------------+---------------+-----------------+--------------+
| ID   | Name          | CountryCode   | District        | Population   |
|------+---------------+---------------+-----------------+--------------|
| 56   | Luanda        | AGO           | Luanda          | 2022000      |
| 57   | Huambo        | AGO           | Huambo          | 163100       |
| 58   | Lobito        | AGO           | Benguela        | 130000       |
| 59   | Benguela      | AGO           | Benguela        | 128300       |
| 60   | Namibe        | AGO           | Namibe          | 118200       |
| 64   | Dubai         | ARE           | Dubai           | 669181       |
| 65   | Abu Dhabi     | ARE           | Abu Dhabi       | 398695       |
| 66   | Sharja        | ARE           | Sharja          | 320095       |
| 67   | al-Ayn        | ARE           | Abu Dhabi       | 225970       |
| 68   | Ajman         | ARE           | Ajman           | 114395       |
| 130  | Sydney        | AUS           | New South Wales | 3276207      |
| 131  | Melbourne     | AUS           | Victoria        | 2865329      |
| 132  | Brisbane      | AUS           | Queensland      | 1291117      |
| 133  | Perth         | AUS           | West Australia  | 1096829      |
| 134  | Adelaide      | AUS           | South Australia | 978100       |
| 135  | Canberra      | AUS           | Capital Region  | 322723       |
| 136  | Gold Coast    | AUS           | Queensland      | 311932       |
| 137  | Newcastle     | AUS           | New South Wales | 270324       |
| 138  | Central Coast | AUS           | New South Wales | 227657       |
| 139  | Wollongong    | AUS           | New South Wales | 219761       |
| 140  | Hobart        | AUS           | Tasmania        | 126118       |
| 141  | Geelong       | AUS           | Victoria        | 125382       |
| 142  | Townsville    | AUS           | Queensland      | 109914       |
| 143  | Cairns        | AUS           | Queensland      | 92273        |
+------+---------------+---------------+-----------------+--------------+

I reduced the results to just three countries, otherwise the list would be way too long for this article.

Now, suppose we wanted to get the population of each district, and we wanted to list each district, along with its population and country code.

We could do this.

SELECT
    CountryCode,
    District,
    SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode, District
ORDER BY CountryCode;

Result:

+---------------+-----------------+--------------+
| CountryCode   | District        | Population   |
|---------------+-----------------+--------------|
| AGO           | Benguela        | 258300       |
| AGO           | Huambo          | 163100       |
| AGO           | Luanda          | 2022000      |
| AGO           | Namibe          | 118200       |
| ARE           | Abu Dhabi       | 624665       |
| ARE           | Ajman           | 114395       |
| ARE           | Dubai           | 669181       |
| ARE           | Sharja          | 320095       |
| AUS           | Capital Region  | 322723       |
| AUS           | New South Wales | 3993949      |
| AUS           | Queensland      | 1805236      |
| AUS           | South Australia | 978100       |
| AUS           | Tasmania        | 126118       |
| AUS           | Victoria        | 2990711      |
| AUS           | West Australia  | 1096829      |
+---------------+-----------------+--------------+

We can see that our results are grouped as specified, and we now get the full population for each district (as opposed to the population of the individual cities, which is how they’re stored in the underlying table).

Note that the GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.

If we wanted to get the population of each country instead of the district, our query becomes even more compact.

SELECT
    CountryCode,
    SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode
ORDER BY CountryCode;

Result:

+---------------+--------------+
| CountryCode   | Population   |
|---------------+--------------|
| AGO           | 2561600      |
| ARE           | 1728336      |
| AUS           | 11313666     |
+---------------+--------------+

Bear in mind that this particular sample database is very out of date, and its population numbers don’t reflect current reality.

Example 3 – The HAVING Clause

You can include the HAVING clause with your GROUP BY clause to filter the groups.

Example:

SELECT
    CountryCode,
    District,
    SUM(Population) AS Population
FROM City
WHERE CountryCode IN ('AGO', 'ARE', 'AUS')
GROUP BY CountryCode, District
HAVING SUM(Population) > 1000000
ORDER BY CountryCode;

Result:

+---------------+-----------------+--------------+
| CountryCode   | District        | Population   |
|---------------+-----------------+--------------|
| AGO           | Luanda          | 2022000      |
| AUS           | New South Wales | 3993949      |
| AUS           | Queensland      | 1805236      |
| AUS           | Victoria        | 2990711      |
| AUS           | West Australia  | 1096829      |
+---------------+-----------------+--------------+

The HAVING clause is similar to the WHERE clause, except that WHERE filters individual rows, whereas HAVING filters groups.

Also, the WHERE clause filters data before it is grouped, whereas HAVING filters data after it is grouped.

The HAVING clause accepts the same operators that you can use with the WHERE clause (such as =, >, >=, IN, LIKE, etc).