Understanding LIST_ZIP() in DuckDB

DuckDB has a list_zip() function that zips one or more lists to a new list with a length of the longest list provided. The new list contains unnamed structs of the elements from the original lists. We pass the lists to the function when we call it. We can also pass a Boolean value to specify whether or not to truncate all lists to the smallest list.

Below are some basic examples that demonstrate how the list_zip() function works.

Three Lists, Two Elements

Here’s a quick and simple example to start:

SELECT list_zip([1, 2], ['Cat', 'Dog'], ['Meow', 'Woof']);

Result:

[(1, Cat, Meow), (2, Dog, Woof)]

I passed three lists, each containing two elements. The function zipped them into a list of two unnamed structs, each containing three values.

Three Lists, Three Elements

This example has three lists, each containing three elements:

SELECT list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof', 'Tweet']);

Result:

[(1, Cat, Meow), (2, Dog, Woof), (3, Bird, Tweet)]

Different Sized Lists

We can pass lists of different sizes:

SELECT list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof']);

Result:

[(1, Cat, Meow), (2, Dog, Woof), (3, Bird, NULL)] 

When we do this, any missing elements are replaced with NULL. We can see that the third struct has NULL in its third position here. That’s because the third list we passed only had two elements, while the others had three.

Let’s make all three lists different sizes:

SELECT list_zip([1, 2, 3, 4], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof']);

Result:

[(1, Cat, Meow), (2, Dog, Woof), (3, Bird, NULL), (4, NULL, NULL)]

We got four structs this time, due to the first list having four elements. As expected, NULL fills in the gaps.

Let’s reduce the first list to two elements:

SELECT list_zip([1, 2], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof']);

Result:

[(1, Cat, Meow), (2, Dog, Woof), (NULL, Bird, NULL)]

Passing NULL Values

The lists that we pass to the function can also contain NULL values in their own right:

SELECT list_zip([1, 2, 3], [NULL, 'Dog', 'Bird'], ['Meow', 'Woof']);

Result:

[(1, NULL, Meow), (2, Dog, Woof), (3, Bird, NULL)]

Truncating to the Shortest List

We can pass a Boolean value to the function in order to specify whether or not the results are truncated to the shortest list provided. To do this, we can pass true after all the lists. The default value is false.

Here’s an example:

SELECT list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof'], true);

Result:

[(1, Cat, Meow), (2, Dog, Woof)]

Here, I passed true in order to truncate it to the length of the shortest list, which contained just two elements.

Here it is again along with false and also the default (which is also false):

.mode line
SELECT 
    list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof']) AS default,
    list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof'], false) AS false,
    list_zip([1, 2, 3], ['Cat', 'Dog', 'Bird'], ['Meow', 'Woof'], true) AS true;

Result:

default = [(1, Cat, Meow), (2, Dog, Woof), (3, Bird, NULL)]
false = [(1, Cat, Meow), (2, Dog, Woof), (3, Bird, NULL)]
true = [(1, Cat, Meow), (2, Dog, Woof)]

We can see that the default and false options return the same result, while the true option truncates the result.

In this example I also switched to line mode in order to output the results vertically in my DuckDB CLI.

Nested Lists

When working with nested lists, we can get different results depending on how we treat the nested lists. For example, we can unnest the lists before they’re fed to list_zip(). Or we can pass the nested lists to list_zip() and then unnest the result of list_zip(). We could also flatten the lists before feeding them to list_zip().

Here’s an example of passing nested lists to list_zip():

SELECT list_zip(
    [[1, 2], [3, 4]], 
    [['Cat', 'Dog'], ['Bird', 'Cow']], 
    [['Meow', 'Woof'], ['Tweet', 'Moo']]
    );

Output:

[([1, 2], [Cat, Dog], [Meow, Woof]), ([3, 4], [Bird, Cow], [Tweet, Moo])]

In this case we didn’t unnest or flatten anything.

Let’s unnest the output from list_zip():

SELECT unnest(
    list_zip(
        [[1, 2], [3, 4]], 
        [['Cat', 'Dog'], ['Bird', 'Cow']], 
        [['Meow', 'Woof'], ['Tweet', 'Moo']]
    )) AS unnested;

Output:

+-------------------------------------+
| unnested |
+-------------------------------------+
| ([1, 2], [Cat, Dog], [Meow, Woof]) |
| ([3, 4], [Bird, Cow], [Tweet, Moo]) |
+-------------------------------------+

This time the result is separated across two rows. The outer list has been removed, due to the unnesting operation.

Another option is to unnest each list before it’s passed to list_zip():

SELECT list_zip(
    unnest([[1, 2], [3, 4]]), 
    unnest([['Cat', 'Dog'], ['Bird', 'Cow']]), 
    unnest([['Meow', 'Woof'], ['Tweet', 'Moo']])
    ) AS unnested;

Output:

+-----------------------------------+
| unnested |
+-----------------------------------+
| [(1, Cat, Meow), (2, Dog, Woof)] |
| [(3, Bird, Tweet), (4, Cow, Moo)] |
+-----------------------------------+

Another option is to flatten each list:

SELECT list_zip(
    flatten([[1, 2], [3, 4]]), 
    flatten([['Cat', 'Dog'], ['Bird', 'Cow']]), 
    flatten([['Meow', 'Woof'], ['Tweet', 'Moo']])
    ) AS flattened;

Output:

+-------------------------------------------------------------------+
| flattened |
+-------------------------------------------------------------------+
| [(1, Cat, Meow), (2, Dog, Woof), (3, Bird, Tweet), (4, Cow, Moo)] |
+-------------------------------------------------------------------+

Here it is unflattened:

SELECT list_zip(
    [[1, 2], [3, 4]], 
    [['Cat', 'Dog'], ['Bird', 'Cow']], 
    [['Meow', 'Woof'], ['Tweet', 'Moo']]
    ) AS unflattened;

Output:

+---------------------------------------------------------------------------+
| unflattened |
+---------------------------------------------------------------------------+
| [([1, 2], [Cat, Dog], [Meow, Woof]), ([3, 4], [Bird, Cow], [Tweet, Moo])] |
+---------------------------------------------------------------------------+