Understanding the Correlated Subquery in SQL

When writing SQL queries, we sometimes find ourselves including a subquery in the query in order to get the results we want. Sometimes the subquery we include could be taken out of that query and run by itself to return its own independent result set. But not the correlated subquery.

A correlated subquery relies on the outer query for its data. We can’t simply move it outside of the outer query and expect it to work.

Let’s take a look the correlated subquery in SQL and observe it in action with a few examples.

What is a Correlated Subquery?

A correlated subquery is a subquery that depends on the outer query for its values. As with any other subquery, it’s a query nested inside another query. But the difference is that a correlated subquery references the outer query (whereas a non-correlated subquery does not).

In contrast to the correlated subquery, the non-correlated subquery is a subquery that can be executed independently of the outer query. In a non-correlated subquery, the subquery does not depend on the outer query for its results.

A correlated subquery is executed once for each row processed by the outer query.

The Anatomy of a Correlated Subquery

A correlated subquery typically follows this pattern:

SELECT column1, column2, ...
FROM table1 outer
WHERE column1 operator
    (SELECT column1
     FROM table2 inner
     WHERE inner.column = outer.column);

Notice the outer part in the subquery. This is a reference to the table in the outer query. The subquery compares a column from the inner query with a column from the outer query.

The main point here is that the inner query references a column from the outer query. This is what makes it “correlated” – it’s dependent on the outer query for its execution.

Setting the Stage: Our Music Database

Before we jump into some correlated subquery examples, let’s set up a sample database to play with. We’ll create tables for artists, albums, and tracks.

Here’s the SQL to get us started:

-- Create artists table
CREATE TABLE artists (
    artist_id INT PRIMARY KEY,
    artist_name VARCHAR(100) NOT NULL,
    country VARCHAR(50)
);

-- Create albums table
CREATE TABLE albums (
    album_id INT PRIMARY KEY,
    album_name VARCHAR(100) NOT NULL,
    artist_id INT,
    release_year INT,
    FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

-- Create tracks table
CREATE TABLE tracks (
    track_id INT PRIMARY KEY,
    track_name VARCHAR(100) NOT NULL,
    album_id INT,
    duration_seconds INT,
    FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

-- Populate artists table
INSERT INTO artists (artist_id, artist_name, country) VALUES
(1, 'The Beatles', 'UK'),
(2, 'Pink Floyd', 'UK'),
(3, 'Led Zeppelin', 'UK'),
(4, 'Queen', 'UK'),
(5, 'David Bowie', 'UK');

-- Populate albums table
INSERT INTO albums (album_id, album_name, artist_id, release_year) VALUES
(1, 'Abbey Road', 1, 1969),
(2, 'The Dark Side of the Moon', 2, 1973),
(3, 'Led Zeppelin IV', 3, 1971),
(4, 'A Night at the Opera', 4, 1975),
(5, 'The Rise and Fall of Ziggy Stardust', 5, 1972),
(6, 'Let It Be', 1, 1970),
(7, 'The Wall', 2, 1979);

-- Populate tracks table
INSERT INTO tracks (track_id, track_name, album_id, duration_seconds) VALUES
(1, 'Come Together', 1, 259),
(2, 'Something', 1, 183),
(3, 'Breathe', 2, 163),
(4, 'Time', 2, 421),
(5, 'Black Dog', 3, 296),
(6, 'Rock and Roll', 3, 220),
(7, 'Bohemian Rhapsody', 4, 354),
(8, 'Love of My Life', 4, 219),
(9, 'Starman', 5, 253),
(10, 'Ziggy Stardust', 5, 194),
(11, 'Let It Be', 6, 243),
(12, 'Get Back', 6, 187),
(13, 'Another Brick in the Wall', 7, 239),
(14, 'Comfortably Numb', 7, 383);

OK, now that we’ve got our tables set up and populated, let’s run some correlated subqueries.

Finding Albums with Longer-than-Average Tracks

Suppose we want to find albums where the average track duration is longer than the overall average track duration across all albums. Here’s how we’d do that with a correlated subquery:

SELECT a.album_name, ar.artist_name
FROM albums a
JOIN artists ar ON a.artist_id = ar.artist_id
WHERE (
    SELECT AVG(duration_seconds)
    FROM tracks t
    WHERE t.album_id = a.album_id
) > (
    SELECT AVG(duration_seconds)
    FROM tracks
);

Result:

album_name                 artist_name
------------------------- -----------
The Dark Side of the Moon Pink Floyd
A Night at the Opera Queen
The Wall Pink Floyd

In this query, the correlated subquery calculates the average track duration for each album and compares it to the overall average. The outer query then returns only those albums where this condition is true.

Finding Artists with More than One Album

Let’s find artists who have released more than one album:

SELECT artist_name
FROM artists a
WHERE 1 < (
    SELECT COUNT(*)
    FROM albums al
    WHERE al.artist_id = a.artist_id
);

Result:

artist_name
-----------
The Beatles
Pink Floyd

Here, the correlated subquery counts the number of albums for each artist. The outer query then filters for artists where this count is greater than 1.

Identifying the Longest Track on Each Album

Now, let’s find the longest track on each album:

SELECT a.album_name, t.track_name, t.duration_seconds
FROM albums a
JOIN tracks t ON a.album_id = t.album_id
WHERE t.duration_seconds = (
    SELECT MAX(duration_seconds)
    FROM tracks t2
    WHERE t2.album_id = a.album_id
);

Result:

album_name                           track_name         duration_seconds
----------------------------------- ----------------- ----------------
The Wall Comfortably Numb 383
Let It Be Let It Be 243
The Rise and Fall of Ziggy Stardust Starman 253
A Night at the Opera Bohemian Rhapsody 354
Led Zeppelin IV Black Dog 296
The Dark Side of the Moon Time 421
Abbey Road Come Together 259

In this example, the correlated subquery finds the maximum duration for each album, and the outer query matches this with the actual track details.

More Subqueries

See Understanding the SQL Subquery for more examples of SQL subqueries.