When writing queries in SQL, we sometimes need to summarize data based on specific criteria. Fortunately SQL makes this easy to achieve with the GROUP BY
and HAVING
clauses. These two clauses allow us to aggregate data and filter the results based on group conditions.
More specifically, GROUP BY
allows us to group the results, and HAVING
is an optional clause that we can use to filter the results based on certain conditions.
Below is an example of where we can use these two clauses to aggregate data and filter it based on certain criteria.
Sample Data
First, let’s set up our sample music database:
-- Create artists table
CREATE TABLE artists (
artist_id INT PRIMARY KEY,
artist_name VARCHAR(100) NOT NULL
);
-- Create albums table
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100) NOT NULL,
artist_id INT,
release_year INT,
FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);
-- Create tracks table
CREATE TABLE tracks (
track_id INT PRIMARY KEY,
track_name VARCHAR(100) NOT NULL,
album_id INT,
duration_seconds INT,
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);
-- Insert sample data
INSERT INTO artists (artist_id, artist_name) VALUES
(1, 'The Melodic Mavericks'),
(2, 'Quantum Quavers'),
(3, 'Rhythmic Rebels');
INSERT INTO albums (album_id, album_name, artist_id, release_year) VALUES
(1, 'Harmonic Horizons', 1, 2020),
(2, 'Sonic Spectrum', 2, 2021),
(3, 'Beats Unbound', 3, 2022),
(4, 'Acoustic Alchemy', 1, 2023);
INSERT INTO tracks (track_id, track_name, album_id, duration_seconds) VALUES
(1, 'Celestial Serenade', 1, 871),
(2, 'Quantum Lullaby', 2, 980),
(3, 'Rhythmic Revolution', 3, 810),
(4, 'Melodic Mirage', 1, 995),
(5, 'Harmonic Haven', 4, 925),
(6, 'Sonic Sunrise', 2, 1055),
(7, 'Beats in Motion', 3, 900),
(8, 'Acoustic Dreams', 4, 930),
(9, 'Gentle Surf', 2, 970);
Now that we have our database set up, let’s explore how we can use the GROUP BY
and HAVING
clauses against its data.
GROUP BY
Clause
We’ll start by using the GROUP BY
clause without the HAVING
clause. The GROUP BY
clause allows us to group rows that have the same values in specified columns.
This is particularly useful when we want to perform aggregate functions on groups of data.
Example: Let’s find the total duration of tracks for each album:
SELECT
a.album_name,
SUM(t.duration_seconds) AS total_duration
FROM
albums a
JOIN
tracks t ON a.album_id = t.album_id
GROUP BY
a.album_name;
Result:
album_name total_duration
----------------- --------------
Acoustic Alchemy 1855
Beats Unbound 1710
Harmonic Horizons 1866
Sonic Spectrum 3005
Here, we used the GROUP BY
clause to group the results by the album names, and we used the SUM()
function to add up the total duration of all tracks in each group (i.e. each album).
We had to do it this way because duration is applied at the track level – not at the album level. So we needed to tally up the duration of each track for each album.
HAVING
Clause
While the WHERE
clause filters rows before grouping, the HAVING
clause filters groups after the GROUP BY
clause has been applied. This allows us to filter based on the results of aggregate functions.
Example: Let’s find albums with a total duration exceeding a certain length:
SELECT
a.album_name,
SUM(t.duration_seconds) AS total_duration
FROM
albums a
JOIN
tracks t ON a.album_id = t.album_id
GROUP BY
a.album_name
HAVING
SUM(t.duration_seconds) > 1860;
Result:
album_name total_duration
----------------- --------------
Harmonic Horizons 1866
Sonic Spectrum 3005
This query first groups the tracks by album, calculates the total duration, and then filters to show only the albums where the total duration is greater than 1860 seconds.
Combining GROUP BY
and HAVING
with other clauses
We can combine the GROUP BY
and HAVING
clauses with others like ORDER BY
to further refine our results.
Example: Let’s find the number of tracks per artist, but only for artists with more than two tracks, sorted by the track count in descending order:
SELECT
a.artist_name,
COUNT(t.track_id) AS track_count
FROM
artists a
JOIN
albums al ON a.artist_id = al.artist_id
JOIN
tracks t ON al.album_id = t.album_id
GROUP BY
a.artist_name
HAVING
COUNT(t.track_id) > 2
ORDER BY
track_count DESC;
Result:
artist_name track_count
--------------------- -----------
The Melodic Mavericks 4
Quantum Quavers 3
In this case, two rows match our criteria.
In conclusion, the above examples demonstrate that the GROUP BY
and HAVING
clauses allow us to aggregate data and filter based on group conditions, enabling us to write queries that return more meaningful results that are grouped in an easy to understand fashion.
More GROUP BY
Queries
See my article called SQL GROUP BY
Clause for Beginners that walks through more examples that use the GROUP BY
clause.
And here are 5 Ways to Select Rows with the Maximum Value for their Group in SQL, which includes some interesting tricks to get data to group in different ways.