In SQL, the UNION
clause concatenates the results of two queries into a single result set.
You can use the UNION
clause with or without the ALL
argument:
UNION ALL
– Includes duplicates.UNION
– Excludes duplicates.
Some RDBMSs also accept UNION DISTINCT
, which is the equivalent to UNION
. That is, it excludes duplicates.
Below are some basic examples to demonstrate how it works.
Sample Tables
Suppose we have the following two tables:
Cats: +---------+-----------+ | CatId | CatName | |---------+-----------| | 1 | Meow | | 2 | Fluffy | | 3 | Scratch | +---------+-----------+ Dogs: +---------+-----------+ | DogId | DogName | |---------+-----------| | 1 | Fetch | | 2 | Fluffy | | 3 | Wag | | 1002 | Fetch | +---------+-----------+
We can use a SELECT
statement with a UNION
clause to combine the results from both tables into one result set.
Example using UNION ALL
First, let’s use UNION ALL
so that it includes duplicates.
SELECT DogName AS PetName
FROM Dogs
UNION ALL
SELECT CatName
FROM Cats;
Result:
+-----------+ | PetName | |-----------| | Fetch | | Fluffy | | Wag | | Fetch | | Meow | | Fluffy | | Scratch | +-----------+ (7 rows affected)
In this case, seven rows are returned. We can see that “Fetch” is returned twice. This is because there are two dogs named Fetch.
There’s also a cat and a dog with the same name: Fluffy.
Note that I used a column alias to name the field returned by the operation. If I hadn’t done that, the result would have used the column names from the first query. In that case, the column header would have been called DogName
instead of PetName
.
SELECT DogName
FROM Dogs
UNION ALL
SELECT CatName
FROM Cats;
Result:
+-----------+ | DogName | |-----------| | Fetch | | Fluffy | | Wag | | Fetch | | Meow | | Fluffy | | Scratch | +-----------+ (7 rows affected)
This may or may not be acceptable, depending on the data that you’re returning in your query. In our case, it’s not appropriate, because not all results are dogs.
Example using UNION
Let’s see what happens when we remove the ALL
argument.
SELECT DogName AS PetName
FROM Dogs
UNION
SELECT CatName
FROM Cats;
Result:
+-----------+ | PetName | |-----------| | Fetch | | Fluffy | | Meow | | Scratch | | Wag | +-----------+ (5 rows affected)
This time only five rows are returned. Both duplicates are removed.
As mentioned, some RDBMSs accept UNION DISTINCT
, which allows us to explicitly remove duplicates (even though they’re implicitly removed with UNION
anyway).
UNION
vs DISTINCT
Note that this is different to applying DISTINCT
to each individual SELECT
statement. If we’d done that, Fluffy would have been returned twice, because the ALL
would only apply for the SELECT
statement that it’s being applied against (not to the concatenated results).
Here’s an example to illustrate what I mean.
SELECT DISTINCT DogName AS PetName
FROM Dogs
UNION ALL
SELECT DISTINCT CatName
FROM Cats;
Result:
+-----------+ | PetName | |-----------| | Fetch | | Fluffy | | Wag | | Fluffy | | Meow | | Scratch | +-----------+ (6 rows affected)
All Queries must Return the Same Number of Columns
When you use the UNION
clause, each query must have the same number of columns, and they must be in the same order.
If not, you’ll get an error.
SELECT CatName FROM Cats
UNION ALL
SELECT DogId, DogName FROM Dogs;
Result:
Msg 205, Level 16, State 1, Line 1 All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
That’s the error that SQL Server returns when using an unequal number of columns. This particular error indicates that the same restriction also applies to the INTERSECT
and EXCEPT
operators. The error message you receive may be different, depending on your DBMS.
Data Types must be Compatible
In addition to requiring the same number of columns, those columns must have a compatible data type.
They don’t necessarily need to be the same data type, but they will need to be compatible. That is, they must be compatible through implicit conversion. If the data types don’t match, the DBMS must be able to do an implicit conversion so that they do match.
If not, you’ll get an error.
SELECT CatName FROM Cats
UNION ALL
SELECT DogId FROM Dogs;
Result:
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value 'Meow' to data type int.
Ordering the Results
If you want to sort the results with the ORDER BY
clause, you’ll need to put it on the last query. You can’t put a separate ORDER BY
clause on each query, or for that matter, any query that isn’t the last one.
Here’s the error I get when trying to do that in SQL Server:
SELECT DogName AS PetName
FROM Dogs
ORDER BY DogName
UNION ALL
SELECT CatName
FROM Cats;
Result:
Msg 156, Level 15, State 1, Line 4 Incorrect syntax near the keyword 'UNION'.
Therefore, if we want to order the results, we’ll need to do something like this:
SELECT DogName AS PetName
FROM Dogs
UNION ALL
SELECT CatName
FROM Cats
ORDER BY PetName;
Applying UNION
to the More than two Queries
The previous examples combined results from two different queries, but there’s nothing to stop you adding more. You can use it to combine the results of many queries if required.
For example, if we also had a Birds
table, we could do this:
SELECT DogName AS PetName
FROM Dogs
UNION ALL
SELECT CatName
FROM Cats
UNION ALL
SELECT BirdName
FROM Birds;
Normalization
The examples on this page put cats and dogs into two separate tables. The reason i did this is because it’s a clear and concise way of illustrating how UNION
works.
In practice, you might have these in the same table called, say Pets
, then have a separate PetTypes
table (or similar). This is known as normalization, and is the way relational databases are usually designed.
You could then run a join on these tables to return data as required.