Using LIST_INTERSECT() to Extract Overlapping Elements from Two Lists in DuckDB

The list_intersect() can be quite a handy function when working with lists in DuckDB. It accepts two lists as arguments, and it returns a list of elements that are common to both lists. It essentially performs a set intersection operation while preserving the list data type.

Duplicate values are ignored, so only one value is returned even if there are multiple occurrences in one or both lists.

Syntax

The syntax goes like this:

list_intersect(list1, list2)

The function returns a list of elements that appear in both list1 and list2 (without duplicates).

The function also has an alias, called array_intersect(). So it can also be used like this:

array_intersect(list1, list2)

Example

Here’s a simple example to demonstrate:

SELECT list_intersect(
    ['apple', 'banana', 'cherry'],
    ['banana', 'cherry', 'date']
    );

Result:

[cherry, banana]

The reason cherry and banana were returned is because they were in both lists.

Duplicate Values

Duplicate values are ignored:

SELECT list_intersect(
    ['apple', 'banana', 'cherry', 'cherry'],
    ['banana', 'cherry', 'cherry']
    );

Result:

[cherry, banana]

So in this case, cherry is returned once even though it appears twice in each list.

Only One Match

If there’s only one matching element, it’s still returned in a list:

SELECT list_intersect(
    ['apple', 'banana', 'cherry'],
    ['banana', 'kiwi', 'date']
    );

Result:

[banana]

No Match

If there’s no matching element, an empty list is returned:

SELECT list_intersect(
    ['apple', 'banana', 'cherry'],
    ['persimmon', 'kiwi', 'date']
    );

Result:

[]

NULL Values

NULL values aren’t counted as matches:

SELECT list_intersect(
    ['apple', 'banana', NULL, 'cherry'],
    ['persimmon', NULL, 'date']
    );

Result:

[]

NULL Lists

If both lists are NULL then NULL is returned.

SELECT list_intersect( NULL, NULL );

Result:

null

Interestingly, if the first list is NULL, then NULL is returned, but if the second list is NULL, an empty list is returned:

SELECT 
    list_intersect( NULL, ['banana', 'cherry'] ) AS first_list_null,
    list_intersect( ['apple', 'cherry'], NULL ) AS second_list_null;

Result:

+-----------------+------------------+
| first_list_null | second_list_null |
+-----------------+------------------+
| null | [] |
+-----------------+------------------+

Nested Lists

If one of the lists contains a nested list but the other doesn’t, then an error is returned:

SELECT list_intersect(
    ['apple', 'banana', 'cherry'],
    [['banana', 'cherry', 'date']]
    );

Result:

Binder Error: list_contains: Cannot match element of type 'VARCHAR' in a list of type 'VARCHAR[][]' - an explicit cast is required

But if both lists contain nested lists, then that’s fine:

SELECT list_intersect(
    [['apple', 'banana', 'cherry']],
    [['banana', 'cherry', 'date']]
    );

Result:

[]

However, in this case, it will be comparing the nested lists with each other. That’s why this example returned an empty list.

Here’s an example of where there’s a match:

SELECT list_intersect(
    [['apple', 'banana'], ['cherry', 'kiwi']],
    [['banana', 'apple'], ['cherry', 'kiwi']]
    );

Result:

[[cherry, kiwi]]

This results in the matching nested list being returned as a nested list. This is in line with the other examples – the list_intersect() function returns a list. So if the matching elements are lists themselves, then they will be nested lists in the result.

Flattening Nested Lists

It’s also possible to flatten lists that contain other lists:

SELECT 
    list_intersect(
        [['apple', 'banana', 'cherry']],
        [['banana', 'cherry', 'date']]
        ) AS not_flattened,
    list_intersect(
        flatten([['apple', 'banana', 'cherry']]),
        flatten([['banana', 'cherry', 'date']])
        ) AS flattened;

Result:

+---------------+------------------+
| not_flattened | flattened |
+---------------+------------------+
| [] | [cherry, banana] |
+---------------+------------------+

While we’re at it, let’s flatten the list from the other example:

SELECT 
    list_intersect(
        [['apple', 'banana'], ['cherry', 'kiwi']],
        [['banana', 'apple'], ['cherry', 'kiwi']]
        ) AS not_flattened,
    list_intersect(
        flatten([['apple', 'banana'], ['cherry', 'kiwi']]),
        flatten([['banana', 'apple'], ['cherry', 'kiwi']])
        ) AS flattened;

Result:

+------------------+-------------------------------+
| not_flattened | flattened |
+------------------+-------------------------------+
| [[cherry, kiwi]] | [kiwi, cherry, banana, apple] |
+------------------+-------------------------------+

An Alias: array_intersect()

DuckDB also has an array_intersect() function, which is an alias for list_intersect(). So we can use either one interchangeably:

SELECT 
    list_intersect(
        ['apple', 'banana', 'cherry'],
        ['banana', 'cherry', 'date']
        ) AS list_intersect,
    array_intersect(
        ['apple', 'banana', 'cherry'],
        ['banana', 'cherry', 'date']
        ) AS array_intersect;

Result:

+------------------+------------------+
| list_intersect | array_intersect |
+------------------+------------------+
| [cherry, banana] | [cherry, banana] |
+------------------+------------------+