Aggregating Data in SQL with the GROUP BY and HAVING Clauses

When writing queries in SQL, we sometimes need to summarize data based on specific criteria. Fortunately SQL makes this easy to achieve with the GROUP BY and HAVING clauses. These two clauses allow us to aggregate data and filter the results based on group conditions.

More specifically, GROUP BY allows us to group the results, and HAVING is an optional clause that we can use to filter the results based on certain conditions.

Below is an example of where we can use these two clauses to aggregate data and filter it based on certain criteria.

Sample Data

First, let’s set up our sample music database:

-- Create artists table
CREATE TABLE artists (
    artist_id INT PRIMARY KEY,
    artist_name VARCHAR(100) NOT NULL
);

-- Create albums table
CREATE TABLE albums (
    album_id INT PRIMARY KEY,
    album_name VARCHAR(100) NOT NULL,
    artist_id INT,
    release_year INT,
    FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

-- Create tracks table
CREATE TABLE tracks (
    track_id INT PRIMARY KEY,
    track_name VARCHAR(100) NOT NULL,
    album_id INT,
    duration_seconds INT,
    FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

-- Insert sample data
INSERT INTO artists (artist_id, artist_name) VALUES
(1, 'The Melodic Mavericks'),
(2, 'Quantum Quavers'),
(3, 'Rhythmic Rebels');

INSERT INTO albums (album_id, album_name, artist_id, release_year) VALUES
(1, 'Harmonic Horizons', 1, 2020),
(2, 'Sonic Spectrum', 2, 2021),
(3, 'Beats Unbound', 3, 2022),
(4, 'Acoustic Alchemy', 1, 2023);

INSERT INTO tracks (track_id, track_name, album_id, duration_seconds) VALUES
(1, 'Celestial Serenade', 1, 871),
(2, 'Quantum Lullaby', 2, 980),
(3, 'Rhythmic Revolution', 3, 810),
(4, 'Melodic Mirage', 1, 995),
(5, 'Harmonic Haven', 4, 925),
(6, 'Sonic Sunrise', 2, 1055),
(7, 'Beats in Motion', 3, 900),
(8, 'Acoustic Dreams', 4, 930),
(9, 'Gentle Surf', 2, 970);

Now that we have our database set up, let’s explore how we can use the GROUP BY and HAVING clauses against its data.

GROUP BY Clause

We’ll start by using the GROUP BY clause without the HAVING clause. The GROUP BY clause allows us to group rows that have the same values in specified columns.

This is particularly useful when we want to perform aggregate functions on groups of data.

Example: Let’s find the total duration of tracks for each album:

SELECT 
    a.album_name, 
    SUM(t.duration_seconds) AS total_duration
FROM 
    albums a
JOIN 
    tracks t ON a.album_id = t.album_id
GROUP BY 
    a.album_name;

Result:

album_name         total_duration
----------------- --------------
Acoustic Alchemy 1855
Beats Unbound 1710
Harmonic Horizons 1866
Sonic Spectrum 3005

Here, we used the GROUP BY clause to group the results by the album names, and we used the SUM() function to add up the total duration of all tracks in each group (i.e. each album).

We had to do it this way because duration is applied at the track level – not at the album level. So we needed to tally up the duration of each track for each album.

HAVING Clause

While the WHERE clause filters rows before grouping, the HAVING clause filters groups after the GROUP BY clause has been applied. This allows us to filter based on the results of aggregate functions.

Example: Let’s find albums with a total duration exceeding a certain length:

SELECT 
    a.album_name, 
    SUM(t.duration_seconds) AS total_duration
FROM 
    albums a
JOIN 
    tracks t ON a.album_id = t.album_id
GROUP BY 
    a.album_name
HAVING 
    SUM(t.duration_seconds) > 1860;

Result:

album_name         total_duration
----------------- --------------
Harmonic Horizons 1866
Sonic Spectrum 3005

This query first groups the tracks by album, calculates the total duration, and then filters to show only the albums where the total duration is greater than 1860 seconds.

Combining GROUP BY and HAVING with other clauses

We can combine the GROUP BY and HAVING clauses with others like ORDER BY to further refine our results.

Example: Let’s find the number of tracks per artist, but only for artists with more than two tracks, sorted by the track count in descending order:

SELECT 
    a.artist_name, 
    COUNT(t.track_id) AS track_count
FROM 
    artists a
JOIN 
    albums al ON a.artist_id = al.artist_id
JOIN 
    tracks t ON al.album_id = t.album_id
GROUP BY 
    a.artist_name
HAVING 
    COUNT(t.track_id) > 2
ORDER BY 
    track_count DESC;

Result:

artist_name            track_count
--------------------- -----------
The Melodic Mavericks 4
Quantum Quavers 3

In this case, two rows match our criteria.

In conclusion, the above examples demonstrate that the GROUP BY and HAVING clauses allow us to aggregate data and filter based on group conditions, enabling us to write queries that return more meaningful results that are grouped in an easy to understand fashion.

More GROUP BY Queries

See my article called SQL GROUP BY Clause for Beginners that walks through more examples that use the GROUP BY clause.

And here are 5 Ways to Select Rows with the Maximum Value for their Group in SQL, which includes some interesting tricks to get data to group in different ways.