Understanding the SQL Subquery

One of the things I love about SQL is just how easy it is to write a simple query that returns a meaningful result set, without having to a whole bunch of complex code. And I imagine most beginners are relieved when they discover this too.

However, while basic queries can go a long way, there’s a whole world of advanced techniques that can take our SQL development to another level. Somewhere in that world, would be the SQL subquery.

In this article, we look at the SQL subquery. We’ll explore what subqueries are, how they work, and when to use them. We’ll also look at some simple examples to demonstrate their use.

What is a SQL Subquery?

Simply put, a subquery is a query nested inside another query. Or even more simply, a query within a query. Subqueries can be used in various parts of SQL statements. They can be used in the SELECT list, the FROM clause, the WHERE clause, and even the HAVING clause.

Subqueries are pretty versatile, and they can help us perform complex operations that might be difficult or even impossible to do with a single query.

Setting Up Our Sample Database

Before we dive into SQL subquery examples, let’s set up a sample database to work with. We’ll create three tables: Books, Authors, and Genres.

Here’s the SQL script to create and populate these tables:

-- Create the tables
CREATE TABLE Authors (
    AuthorID INT PRIMARY KEY,
    AuthorName VARCHAR(100),
    BirthYear INT
);

CREATE TABLE Genres (
    GenreID INT PRIMARY KEY,
    GenreName VARCHAR(50)
);

CREATE TABLE Books (
    BookID INT PRIMARY KEY,
    Title VARCHAR(200),
    AuthorID INT,
    GenreID INT,
    PublicationYear INT,
    Price DECIMAL(10, 2),
    FOREIGN KEY (AuthorID) REFERENCES Authors(AuthorID),
    FOREIGN KEY (GenreID) REFERENCES Genres(GenreID)
);

-- Populate the tables with sample data
INSERT INTO Authors (AuthorID, AuthorName, BirthYear) VALUES
(1, 'Jane Austen', 1775),
(2, 'George Orwell', 1903),
(3, 'J.K. Rowling', 1965),
(4, 'Agatha Christie', 1890),
(5, 'Terry Pratchett', 1948),
(6, 'Virginia Woolf', 1882),
(7, 'Franz Kafka', 1883);

INSERT INTO Genres (GenreID, GenreName) VALUES
(1, 'Classic Literature'),
(2, 'Dystopian Fiction'),
(3, 'Fantasy'),
(4, 'Mystery'),
(5, 'Science Fiction');

INSERT INTO Books (BookID, Title, AuthorID, GenreID, PublicationYear, Price) VALUES
(1, 'Pride and Prejudice', 1, 1, 1813, 12.99),
(2, '1984', 2, 2, 1949, 14.99),
(3, 'Harry Potter and the Philosopher''s Stone', 3, 3, 1997, 19.99),
(4, 'Murder on the Orient Express', 4, 4, 1934, 11.99),
(5, 'The Colour of Magic', 5, 3, 1983, 9.99),
(6, 'Emma', 1, 1, 1815, 13.99),
(7, 'Animal Farm', 2, 2, 1945, 10.99),
(8, 'Harry Potter and the Chamber of Secrets', 3, 3, 1998, 21.99),
(9, 'Death on the Nile', 4, 4, 1937, 12.99),
(10, 'Good Omens', 5, 5, 1990, 15.99);

Now that we have our database set up, let’s explore some SQL subquery examples.

Subquery in the WHERE Clause

One of the most common uses of subqueries is in the WHERE clause. This allows us to filter results based on the outcome of another query.

For example, let’s find all books that are more expensive than the average book price:

SELECT Title, Price
FROM Books
WHERE Price > (SELECT AVG(Price) FROM Books);

Result:

                  title                   | price 
------------------------------------------+-------
1984 | 14.99
Harry Potter and the Philosopher's Stone | 19.99
Harry Potter and the Chamber of Secrets | 21.99
Good Omens | 15.99

In this query, the subquery (SELECT AVG(Price) FROM Books) calculates the average price of all books, and the main query then selects only the books with a price higher than this average.

Subquery in the SELECT List

Subqueries can also be used in the SELECT list to create calculated columns. For instance, let’s display each book’s title along with how much its price differs from the average:

SELECT Title,
       Price,
       Price - (SELECT AVG(Price) FROM Books) AS PriceDifference
FROM Books;

Result:

                  title                   | price |   pricedifference   
------------------------------------------+-------+---------------------
Pride and Prejudice | 12.99 | -1.6000000000000000
1984 | 14.99 | 0.4000000000000000
Harry Potter and the Philosopher's Stone | 19.99 | 5.4000000000000000
Murder on the Orient Express | 11.99 | -2.6000000000000000
The Colour of Magic | 9.99 | -4.6000000000000000
Emma | 13.99 | -0.6000000000000000
Animal Farm | 10.99 | -3.6000000000000000
Harry Potter and the Chamber of Secrets | 21.99 | 7.4000000000000000
Death on the Nile | 12.99 | -1.6000000000000000
Good Omens | 15.99 | 1.4000000000000000

Subquery in the FROM Clause

When used in the FROM clause, subqueries create a temporary table that can be queried like any other table. These queries are sometimes called derived tables or table expressions because the outer query uses the results of the subquery as a data source.

Let’s find the authors who have written books in multiple genres:

SELECT AuthorName, GenreCount
FROM (
    SELECT Authors.AuthorID, Authors.AuthorName, COUNT(DISTINCT Books.GenreID) AS GenreCount
    FROM Authors
    JOIN Books ON Authors.AuthorID = Books.AuthorID
    GROUP BY Authors.AuthorID, Authors.AuthorName
) AS AuthorGenres
WHERE GenreCount > 1;

Result:

   authorname    | genrecount 
-----------------+------------
Terry Pratchett | 2

Here, the inner query creates a temporary table with each author’s name and the count of distinct genres they’ve written in, then the outer query takes that data and filters it for authors with more than one genre.

Correlated Subqueries

A correlated subquery is a subquery that references a table in the outer query. In other words, it depends on the outer query for at least one of its values.

Here’s an example that finds books that are the most expensive in their genre:

SELECT b1.Title, b1.Price, g.GenreName
FROM Books b1
JOIN Genres g ON b1.GenreID = g.GenreID
WHERE b1.Price = (
    SELECT MAX(b2.Price)
    FROM Books b2
    WHERE b2.GenreID = b1.GenreID
);

Result:

                  title                  | price |     genrename      
-----------------------------------------+-------+--------------------
1984 | 14.99 | Dystopian Fiction
Emma | 13.99 | Classic Literature
Harry Potter and the Chamber of Secrets | 21.99 | Fantasy
Death on the Nile | 12.99 | Mystery
Good Omens | 15.99 | Science Fiction

This query compares each book’s price with the maximum price in its genre, selecting only those that match.

Notice that the subquery contains a reference to b1, even though the subquery doesn’t include that table in its FROM clause. But the outer query does include it, and so it goes ahead and uses that table as its source.

EXISTS and NOT EXISTS

The EXISTS and NOT EXISTS operators can be used with subqueries to check for the presence or absence of matching records. Let’s find authors who haven’t written any books in our database:

SELECT AuthorName
FROM Authors a
WHERE NOT EXISTS (
    SELECT 1
    FROM Books b
    WHERE b.AuthorID = a.AuthorID
);

Result:

   authorname   
----------------
Virginia Woolf
Franz Kafka

This query returns authors who don’t have any books in the Books table.

When to Use Subqueries

Subqueries are incredibly useful, but they’re not always the best solution. Here are some scenarios where subqueries shine:

  • When you need to perform calculations on an entire dataset before filtering or selecting results.
  • For complex conditions that may be difficult to express (or visualise) with joins.
  • When you need to compare values against aggregated results.
  • For creating more readable queries by breaking complex logic into smaller, more manageable pieces.

However, be cautious with subqueries, especially correlated subqueries, as they can impact performance on large datasets. In some cases, joins or other SQL constructs might be more efficient. That said, it may also depend on your RDBMS and how its optimiser works. Some RDBMSs may even implement certain subqueries as joins anyway.

SQL Joins

As alluded to, many subqueries can easily be implemented as a join (and vice-versa). We will often have the choice of using a SQL subquery or a SQL join, and our decision may depend on performance, or perhaps simply our own preference of one over another.

Either way, it pays to be knowledgeable about both subqueries and joins, so that we can make an informed decision. To learn more about joins, see my SQL Joins Tutorial, which explains how joins work, the different types of joins, as well as simple examples to demonstrate these.