Probably the most common way to join tables in SQL is with the ON
clause. But that’s not the only way.
We can also join with the USING
clause, which can be more concise, while providing the same or similar results. And there’s also the concept of a natural join, which is more concise again.
Let’s take a look at these three join options and compare them side by side.
The JOIN...ON
Option
The ON
clause is arguably the most flexible and widely used method for joining tables in SQL. It allows us to explicitly define the columns on which the join should occur, making it suitable for both simple and complex queries.
Syntax
The basic syntax goes like this:
SELECT columns
FROM table1
JOIN table2 ON table1.column = table2.column;
So we explicitly specify which columns join the tables together. Not only do we specify the columns, but we also qualify them with the respective table names. In many cases, the columns might share the same name across both tables, and by prefixing the column name with the table name, there is no doubt about which table we’re referring to.
We can also use the ON
clause when the column names are different. For example, ...ON authors.id = books.author_id
. In this case, the authors
table has an id
column but the related column in the books
table is called author_id
.
Advantages of JOIN...ON
- Control over column matching: We can specify exactly which columns should be used to join the tables.
- Non-equijoins: We can join tables using conditions other than equality (
=
) such as greater than (>
) or less than (<
). - Widespread DBMS support: Most major RDBMSs support joins with the
ON
clause.
Example
Here’s an example that represents a common scenario that requires the ON
clause:
SELECT employees.name, departments.name
FROM employees
JOIN departments ON employees.department_id = departments.id;
Here, the ON
clause specifies that the employees
and departments
tables should be joined using the department_id
and id
columns.
The USING
Clause
The USING
clause is a shorthand that simplifies the join process when the columns being joined have the same name in both tables. Unlike the ON
clause, USING
automatically assumes that the common column(s) will be used for the join.
Syntax
The syntax goes like this:
SELECT columns
FROM table1
JOIN table2 USING (common_column);
Advantages of USING
- Simplifies the syntax: We don’t need to specify the table name before the column if the column exists in both tables.
- Cleaner code: Reduces repetition when dealing with identical column names in multiple tables.
- Suppresses redundant columns: Produces only one of the two matching columns from the join, effectively eliminating duplicate columns. For example, instead of returning two
department_id
columns (one for each table), it returns just one. This is only evident when using the asterisk wildcard (*
) to select all columns from both tables.
Disadvantages/Limitations of USING
Despite its advantages, we need to be mindful of the following:
- Column names must match: You can only use
USING
if the columns have the same name in both tables. - Limited flexibility: You cannot use operators other than equality (
=
), nor can you join on differently named columns. - Limited DBMS support: Not all RDBMSs support the
USING
clause. Even if your DBMS supports this clause, and you decide to use it, this could become an issue if you ever need to port the database to a different system.
Example
SELECT employees.name, departments.name
FROM employees
JOIN departments USING (department_id);
In this case, the department_id
column exists in both tables, so USING
simplifies the query by eliminating the need for specifying the column with the table name.
The NATURAL JOIN
A NATURAL JOIN
automatically joins two tables based on all columns with the same name and data type in both tables. It’s convenient when you want to join tables on all matching columns without specifying them manually.
Syntax
This join has the simplest syntax of them all:
SELECT columns
FROM table1
NATURAL JOIN table2;
Advantages of NATURAL JOIN
- Automatic column matching: SQL automatically detects and joins the tables on all columns with the same name.
- Convenience: You don’t need to manually list the join conditions.
- Suppresses redundant columns: As with the
USING
clause, natural joins produce only one of the two matching columns from the join, effectively eliminating duplicate columns.
Disadvantages/Limitations of NATURAL JOIN
Despite its ease of use, this join is the riskiest. Here are some of the main disadvantages in a nutshell:
- Lack of control: The query automatically joins based on all matching column names, which may lead to unexpected results.
- Ambiguity: If multiple columns share the same name, but shouldn’t be used in the join, you could end up with incorrect data or duplicates.
- Difficult to debug: Debugging can be harder because the join conditions aren’t explicitly stated.
- Limited DBMS support: Not all RDBMSs support natural joins.
Example
SELECT employees.name, departments.name
FROM employees
NATURAL JOIN departments;
If both tables have a column named department_id
, SQL will automatically join them using that column. We didn’t need to specify the column name, as the database engine will automatically look for columns with matching names and column types.
Key Differences at a Glance
The following table outlines the main differences between each option:
Feature | ON | USING | NATURAL JOIN |
---|---|---|---|
Control | Full control over join columns | Limited to same-named columns | No control; all matching columns are used |
Flexibility | Can join on any condition | Only equality on same-named columns | No flexibility |
Column Ambiguity | No ambiguity | Requires same-named columns | Possible ambiguity with multiple matching columns |
Syntax Complexity | Most complex syntax | Simpler syntax | Simplest but least flexible |
DBMS Support | Widespread support | Limited support | Limited support |
Which Should You Use?
Here are some possible reasons why you might choose one clause or join type over another:
- You might choose
ON
when you need full control over the join logic (it could easily be argued that you would always want full control over the join logic) or are working with complex queries involving multiple conditions. - You might choose
USING
for simplifying queries where the column names are identical and you want clean, readable SQL code. NATURAL JOIN
should be used with caution, as it can lead to unintended results if you’re not careful about the column structure. It’s best suited for simpler databases with well-defined matching columns.
With that said, in most cases you should probably use the JOIN ... ON
option. This provides the most control, eliminates the potential mishaps associated with the other options, and it has the most support amongst DBMSs.