DuckDB provides us with the list_has_any()
function that enables us to find any overlapping elements across two lists. The way it works is that we pass two lists, and it returns true
or false
, depending on whether any element exists in both lists.
Example
Here’s a simple example to demonstrate how it works:
SELECT list_has_any(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Horse', 'Frog']
);
Result:
true
It returns true
because Zebra
is in both lists.
Here’s what happens if we change Zebra
to something else:
SELECT list_has_any(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Fish', 'Horse', 'Frog']
);
Result:
false
This time it returned false
. That’s because none of the elements are in both lists.
NULL Values
Here’s an example where both lists contain NULL values:
SELECT list_has_any(
['Cat', 'Dog', NULL, 'Zebra'],
['Fish', NULL, 'Frog']
);
Result:
false
We get false
, even though NULL
is in both lists. NULL values are basically unknown values. They represent the absence of a value.
NULL Lists
Let’s pass a NULL list:
.nullvalue 'null'
SELECT list_has_any(
['Cat', 'Dog', 'Zebra'],
NULL
);
Result:
null
Passing a NULL list to the list_has_any()
function returns NULL. I specified .nullvalue 'null'
here in order to display NULL values as null
in my DuckDB CLI. The default is an empty string.
Type Mismatch
Both lists must be of the same type. Here’s what happens if we try to compare lists of different types:
SELECT list_has_any(
['Cat', 'Dog', 'Bird', 'Zebra'],
[1,7,8]
);
Result:
Binder Error: 'list_has_any' cannot compare lists of different types: 'VARCHAR' and 'INTEGER'
In this case I tried to compare a list of strings with a list of numbers, which resulted in an error.
However, the following is fine:
SELECT list_has_any(
[ 1, 2, 3.5 ],
[ 1, 7, 8 ]
);
Result:
true
Nested Lists
Nested lists can provide a similar challenge, depending on the lists. For example, the following returns an error:
SELECT list_has_any(
['Cat', 'Dog', 'Zebra'],
[['Zebra', 'Bull']]
);
Result:
Binder Error: 'list_has_any' cannot compare lists of different types: 'VARCHAR' and 'VARCHAR[]'
In this case the second list contains a nested list, but the first list doesn’t.
If both lists have nested lists, then it’s fine:
SELECT list_has_any(
[['Cat', 'Dog', 'Zebra']],
[['Zebra', 'Bull']]
);
Result:
false
But this now means that we’re testing whether or not the second nested list is within the first. In this case it wasn’t, and so false
was returned.
Here’s an example of where it returns true
:
SELECT list_has_any(
[['Cat', 'Dog', 'Zebra'], ['Zebra', 'Bull']],
[['Zebra', 'Bull']]
);
Result:
true
In this case, both arguments contain a nested list with Zebra
, followed by Bull
.
But order matters:
SELECT list_has_any(
[['Cat', 'Dog', 'Zebra'], ['Bull', 'Zebra']],
[['Zebra', 'Bull']]
);
Result:
false
I switched the order in one of the sub-lists, and the result is false
.
We can also use functions like unnest()
and flatten()
in order to match individual elements within the lists:
SET VARIABLE list1 = [['Cat', 'Dog', 'Bird', 'Zebra']];
SET VARIABLE list2 = [['Zebra', 'Cat']];
SELECT
list_has_any(
getvariable('list1'),
getvariable('list2')
) AS original_lists,
list_has_any(
unnest( getvariable('list1') ),
unnest( getvariable('list2') )
) AS unnested,
list_has_any(
flatten( getvariable('list1') ),
flatten( getvariable('list2') )
) AS flattened;
Result:
+----------------+----------+-----------+
| original_lists | unnested | flattened |
+----------------+----------+-----------+
| false | true | true |
+----------------+----------+-----------+
An Alias: array_has_any()
DuckDB also has an array_has_any()
function, which is an alias for list_has_any()
. So we can use either one interchangeably:
SELECT
list_has_any(
['apple', 'banana', 'cherry'],
['banana', 'cherry', 'date']
) AS list_has_any,
array_has_any(
['apple', 'banana', 'cherry'],
['banana', 'cherry', 'date']
) AS array_has_any;
Result:
+--------------+---------------+
| list_has_any | array_has_any |
+--------------+---------------+
| true | true |
+--------------+---------------+
Checking For All Elements: The list_has_all()
Function
DuckDB also has a list_has_all()
function that checks whether all elements in the second list are in the first. It goes something like this:
SELECT list_has_all(
['Cat', 'Dog', 'Bird', 'Zebra'],
['Zebra', 'Horse', 'Frog']
);
Result:
false
It returned false
because not all items in the second list are in the first list. These are the same two lists that returned true when we ran them through the list_has_any()
function near the start of this article.