When working with lists in DuckDB, we sometimes need to check whether a list contains specific elements. The list_has_all()
function is a handy tool that allows us to verify if all elements of one list exist within another. This function is particularly useful in filtering queries, data validation, and advanced list-based operations.
In this article, we’ll explore how list_has_all()
works in DuckDB.
Syntax
The basic syntax for DuckDB’s list_has_all()
function goes like this:
list_has_all(list, sub-list)
The function returns a Boolean; true
if all elements of sub-list
exist in list
, false
otherwise.
Example
Here’s a simple example to demonstrate:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Cat']
);
Result:
true
It returned true because the first list contains all elements in the second list.
While it’s true that the first list contains more items than the second list, that’s not relevant. What’s relevant is that all of the items in the second list are also in the first list.
Notice that the order didn’t matter; In the first list Cat
came before Zebra
, but in the second list Zebra
came before Cat
. So it’s not looking for an exact copy of the second list being in the first list. Rather, it’s simply looking for all elements that are in the second list, regardless of order.
Duplicate Elements
Duplicate elements in the second list don’t cause any issues. What I mean is, if the second list contains duplicate elements, we can still get a true result, as long as the first list contains at least one matching element.
Example:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Cat', 'Cat']
);
Result:
true
In this case the second list contains Cat
twice, but the first list contains Cat
only once. We still got a match.
No Match
Here’s an example of no match:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Red', 'Green']
);
Result:
false
That’s an obvious one, because the none of the items in the second list were in the first list.
Here’s another one:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Cat', 'Horse']
);
Result:
false
This time some of the elements in the second list were in the first list, but not all. The fact that Horse
wasn’t in the first list was enough for list_has_all()
to return false
.
Type Mismatch
Both lists must be of the same type. Here’s what happens if we try to compare lists of different types:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
[1,2]
);
Result:
Binder Error: 'list_has_all' cannot compare lists of different types: 'VARCHAR' and 'INTEGER'
In this case I tried to compare a VARCHAR
list with an INTEGER
list, which resulted in an error.
However, the following is fine:
SELECT list_has_all(
[ 1, 2, 3.5 ],
[ 1, 2 ]
);
Result:
true
Nested Lists
Nested lists can provide a similar challenge, depending on the lists. For example, the following returns an error:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
[['Zebra', 'Cat', 'Cat']]
);
Result:
Binder Error: 'list_has_all' cannot compare lists of different types: 'VARCHAR' and 'VARCHAR[]'
In this case the second list contains a nested list, but the first list doesn’t.
If both lists have nested lists, then it’s a different story:
SELECT list_has_all(
[['Cat', 'Dog', 'Bird', 'Zebra']],
[['Zebra', 'Cat', 'Cat']]
);
Result:
false
But this now means that we’re testing whether or not the second nested list is within the first. In this case it wasn’t, and so false
was returned.
Here’s an example of where it returns true
:
SELECT list_has_all(
[['Dog', 'Bird'],['Zebra', 'Cat']],
[['Zebra', 'Cat']]
);
Result:
true
In this case, both arguments contain a nested list with Zebra
, followed by Cat
.
But order matters:
SELECT list_has_all(
[['Dog', 'Bird'],['Cat', 'Zebra']],
[['Zebra', 'Cat']]
);
Result:
false
I switched the order in one of the sub-lists, and the result is false
.
We can also use functions like unnest()
and flatten()
in order to match individual elements within the lists:
SET VARIABLE list1 = [['Cat', 'Dog', 'Bird', 'Zebra']];
SET VARIABLE list2 = [['Zebra', 'Cat']];
SELECT
list_has_all(
getvariable('list1'),
getvariable('list2')
) AS original_lists,
list_has_all(
unnest( getvariable('list1') ),
unnest( getvariable('list2') )
) AS unnested,
list_has_all(
flatten( getvariable('list1') ),
flatten( getvariable('list2') )
) AS flattened;
Result:
+----------------+----------+-----------+
| original_lists | unnested | flattened |
+----------------+----------+-----------+
| false | true | true |
+----------------+----------+-----------+
An Alias: array_has_all()
DuckDB also has an array_has_all()
function, which is an alias for list_has_all()
. So we can use either one interchangeably:
SELECT
list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Cat']
) AS list_has_all,
array_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Cat']
) AS array_has_all;
Result:
+--------------+---------------+
| list_has_all | array_has_all |
+--------------+---------------+
| true | true |
+--------------+---------------+
Checking for ANY Element
DuckDB also has a list_has_any()
function that returns true if any elements exist in both lists. It can be used in the same way that list_has_all()
is used, but it will often return a different result, depending on the lists.