Understanding Common Table Expressions (CTEs): A Beginner’s Guide

Common table expressions (CTEs) are a feature that we can use in SQL to help simplify complex queries and enhance readability. They can help with things like code readability, performance, recursive queries, and more.

This article explores what CTEs are, their syntax, types, use cases, and best practices, along with examples to help illustrate their usage.

What is a Common Table Expression (CTE)?

A common table expression (CTE) is a temporary result set defined within a SQL query that can be referenced by the main query or even recursively within itself. CTEs provide a way to structure queries for improved readability, making them a go-to tool for handling complex queries.

Some of the main characteristics of CTEs include:

  • Temporary: CTEs exist only during the execution of the query.
  • Readable: CTEs help break down long, complicated queries into more manageable pieces.
  • Reusable: A CTE can be referenced multiple times within a query, reducing code repetition.
  • Recursive capability: CTEs can call themselves, enabling operations like hierarchical data traversing.

CTEs are not stored in the database as views or tables are, but they act as temporary result sets.

Why Use CTEs?

CTEs can make queries more readable and maintainable by breaking down complex SQL statements into smaller, reusable sections. Here are some reasons you might consider using CTEs:

  • Improved code readability: CTEs allow you to split a query into logical steps, improving clarity. This is especially useful when joining multiple tables or performing several subqueries.
  • Reusability: When you need to use the same result set multiple times in a query, a CTE eliminates the need for duplicating code.
  • Recursive queries: Recursive CTEs allow you to efficiently handle hierarchical or self-referencing data, such as organizational structures, file systems, project dependencies, graph-based data, etc.
  • Performance optimization: By reducing redundancy, CTEs can sometimes improve query performance, though this depends on the SQL engine’s optimization.

CTE Syntax: How to Write a CTE

The syntax for writing a CTE is relatively simple. Here’s the basic structure:

WITH cte_name AS (
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
)
SELECT *
FROM cte_name;

Explanation:

  • WITH keyword: This indicates the start of a CTE definition.
  • cte_name: The name given to the temporary result set.
  • AS ( ... ): The subquery inside the CTE. This query produces the temporary result set that we referred to. We can use this by referring to the cte_name.
  • The SELECT that follows uses the CTE as if it were a table or view. It references the CTE by its name (cte_name).

Example 1: A Simple CTE

Let’s start with a simple example to see how a CTE works in practice.

Suppose we have an Employees table:

EmployeeID  FirstName  LastName  DepartmentID  Salary  ManagerID
---------- --------- -------- ------------ ------ ---------
1 Aisha Khan 1 50000 null
2 Carlos Gomez 2 60000 1
3 Chun Li 1 55000 1
4 Amara Singh 3 70000 2
5 Omar Hassan 2 45000 2

Now, if we want to calculate the average salary for employees in each department, we can use a CTE as follows:

WITH DepartmentSalary AS (
    SELECT 
        DepartmentID, 
        AVG(Salary) AS AvgDeptSalary
    FROM Employees
    GROUP BY DepartmentID
)
SELECT 
    e.EmployeeID, 
    e.FirstName, 
    e.LastName,
    e.Salary, 
    ds.AvgDeptSalary
FROM Employees e
JOIN DepartmentSalary ds
ON e.DepartmentID = ds.DepartmentID;

Result:

EmployeeID  FirstName  LastName  Salary  AvgDeptSalary
---------- --------- -------- ------ -------------
1 Aisha Khan 50000 52500
3 Chun Li 55000 52500
2 Carlos Gomez 60000 52500
5 Omar Hassan 45000 52500
4 Amara Singh 70000 70000

Explanation:

  • The DepartmentSalary CTE calculates the average salary for each department.
  • The main query then joins the Employees table with the CTE to display each employee’s name and salary, along with the average salary of their department.

Example 2: Using a CTE for Filtering

CTEs can also be helpful in filtering results. Imagine you want to find all employees who earn more than the average salary in their department:

WITH DepartmentSalary AS (
    SELECT 
        DepartmentID, 
        AVG(Salary) AS AvgDeptSalary
    FROM Employees
    GROUP BY DepartmentID
)
SELECT 
    e.EmployeeID, 
    e.FirstName, 
    e.LastName, 
    e.Salary
FROM Employees e
JOIN DepartmentSalary ds
ON e.DepartmentID = ds.DepartmentID
WHERE e.Salary > ds.AvgDeptSalary;

Result:

EmployeeID  FirstName  LastName  Salary
---------- --------- -------- ------
3 Chun Li 55000
2 Carlos Gomez 60000

This query retrieves all employees whose salary exceeds the average salary in their respective departments.

Recursive CTEs in SQL

One of the most powerful features of CTEs is recursion. Recursive CTEs are used to solve problems involving hierarchical or tree-like data structures, such as an organizational chart or a bill of materials.

A recursive CTE consists of two parts:

  1. Anchor Member: This defines the starting point of the recursion.
  2. Recursive Member: This part references the CTE itself and defines how to iterate through the data.

Example 3: Recursive CTE for Hierarchical Data

As a refresher, here’s what our table looks like:

EmployeeID  FirstName  LastName  DepartmentID  Salary  ManagerID
---------- --------- -------- ------------ ------ ---------
1 Aisha Khan 1 50000 null
2 Carlos Gomez 2 60000 1
3 Chun Li 1 55000 1
4 Amara Singh 3 70000 2
5 Omar Hassan 2 45000 2

We can see that there’s a ManagerID column that contains each employee’s manager ID.

In this table, EmployeeID 1 is the CEO (no manager), EmployeeID 2 and EmployeeID 3 report to the CEO, and EmployeeID 4 and EmployeeID 5 report to Employee 2.

Let’s say we want to get the hierarchical structure of employees (e.g., “level 1”, “level 2” etc). We can use a recursive CTE to do this:

WITH EmployeeHierarchyCTE AS (
    -- Anchor member: Start with employees who have no manager (the CEO)
    SELECT EmployeeID, FirstName, LastName, ManagerID, 1 AS Level
    FROM Employees
    WHERE ManagerID IS NULL
    
    UNION ALL
    
    -- Recursive member: Join employees with their managers
    SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID, eh.Level + 1 AS Level
    FROM Employees e
    JOIN EmployeeHierarchyCTE eh
    ON e.ManagerID = eh.EmployeeID
)
SELECT *
FROM EmployeeHierarchyCTE
ORDER BY Level;

Result:

EmployeeID  FirstName  LastName  ManagerID  Level
---------- --------- -------- --------- -----
1 Aisha Khan null 1
2 Carlos Gomez 1 2
3 Chun Li 1 2
4 Amara Singh 2 3
5 Omar Hassan 2 3

Explanation:

  • The anchor member selects the CEO, who has no manager (ManagerID is NULL).
  • The recursive member joins the employees with their managers, incrementing the level by 1 each time.
  • The query recursively traverses the hierarchy to display each employee’s level.

This is just one of many possible scenarios that we might want to use a recursive CTE. Other examples could include file systems, category trees, project dependencies, graph-based data, and more.

Differences and Considerations Across DBMSs

Common table expressions generally have widespread support amongst the major RDBMSs. However, there are differences in how they’re implemented. For example:

  • Recursive limits: Some DBMSs, such as SQL Server and MySQL, impose a default recursion limit (typically 100 or 1000). This can be adjusted based on the DBMS but is important to consider when writing recursive CTEs.
  • Optimization behavior: Different DBMSs have different ways of optimizing queries, and some provide us with ways to influence this. For example, in SQLite and PostgreSQL we can use the AS MATERIALIZED or AS NOT MATERIALIZED query hints to influence how the CTE should be implemented by the query planner.
  • Older versions: For older versions of some DBMSs (e.g., MySQL pre-8.0), CTEs are not supported, and you would need to use subqueries, temporary tables, or derived tables instead.
  • Performance considerations: Depending on the DBMS and the complexity of the query, materializing CTEs (storing the result temporarily) can either help or hurt performance. This is why it’s important to test the performance of CTEs in your specific DBMS and dataset.

Best Practices for Using CTEs

To maximize the efficiency and maintainability of CTEs, consider the following best practices:

  • Use for readability: Leverage CTEs to make your queries more understandable and maintainable. Avoid using overly complex CTE chains, as they can become hard to debug.
  • Limit recursive CTEs: Recursive CTEs are powerful but can lead to performance issues if not used carefully. Always include a termination condition to prevent infinite recursion.
  • Optimize performance: While CTEs improve code readability, they might not always improve performance. Ensure that the SQL engine handles the CTE efficiently by testing your query with large datasets.
  • Name CTEs descriptively: Use meaningful names for your CTEs to clarify the purpose of each section.
  • Use with moderation: Avoid overusing CTEs when simple joins or subqueries suffice, as too many CTEs can sometimes hinder performance.

Conclusion

Common table expressions (CTEs) are a powerful and flexible feature of SQL, enabling us to simplify complex queries, handle hierarchical data, and improve code readability. Whether we’re dealing with nested subqueries or recursive data structures, CTEs can significantly enhance the efficiency and clarity of our SQL scripts.