DuckDB’s list_select()
function allows us to extract specific elements from list columns based on their positions. This can be useful when we need to access or manipulate elements within arrays or lists.
Syntax
The basic syntax goes something like this:
list_select(value_list, index_list)
A slightly more specific syntax looks like this:
list_select(ANY[], BIGINT[]) -> ANY[]
The first argument is the source list and the second argument is a list that contains the positions of each element that we want to extract from the source list.
Example
Here’s a simple example to demonstrate:
SELECT list_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 4]);
Result:
[Box, Ant]
The function returns only the list items specified. It doesn’t return the whole range.
So it’s a bit different to the list_slice()
function, which returns all list elements within a given range.
list_select()
vs list_slice()
To continue that last point, here’s a comparison of list_select()
and list_slice()
:
SELECT
list_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 4]) AS list_select,
list_slice(['Sky', 'Box', 'Run', 'Ant', 'Cam'], 2, 4) AS list_slice;
Output:
+-------------+-----------------+
| list_select | list_slice |
+-------------+-----------------+
| [Box, Ant] | [Box, Run, Ant] |
+-------------+-----------------+
We can see that list_slice()
returned all elements from the specified range, whereas list_select()
returns just the specified list items themselves.
However, list_slice()
also allows us to specify a step to use, so we could modify our query to achieve the same result from both functions:
SELECT
list_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 4]) AS list_select,
list_slice(['Sky', 'Box', 'Run', 'Ant', 'Cam'], 2, 4, 2) AS list_slice;
Output:
+-------------+------------+
| list_select | list_slice |
+-------------+------------+
| [Box, Ant] | [Box, Ant] |
+-------------+------------+
And on the other hand, we could make list_select()
return the same result as list_slice()
by explicitly specifying all list items within the range:
SELECT
list_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 3, 4]) AS list_select,
list_slice(['Sky', 'Box', 'Run', 'Ant', 'Cam'], 2, 4) AS list_slice;
Output:
+-----------------+-----------------+
| list_select | list_slice |
+-----------------+-----------------+
| [Box, Run, Ant] | [Box, Run, Ant] |
+-----------------+-----------------+
That said, each function has its own purpose. The list_select()
function can be useful in cases where we need to pick arbitrary elements from non-contiguous positions that don’t necessarily have a placement pattern. For example:
SELECT list_select(
['Sky', 'Box', 'Run', 'Ant', 'Cam', 'Sit', 'Zip', 'Cat', 'Dog', 'Lit'],
[1, 4, 5, 10]
);
Result:
[Sky, Ant, Cam, Lit]
Selecting Non-Existent Elements
If the second argument contains list indexes that don’t exist in the list (i.e. the list contains less items than the specified index), then NULL
is returned for those elements:
SELECT list_select(['Sky', 'Box', 'Run'], [1, 4]);
Output:
[Sky, NULL]
In this case I tried to get list item 4, but the list only contains 3 elements.
The same is true when none of the specified indexes appear in the list:
SELECT list_select(['Sky', 'Box', 'Run'], [4, 5]);
Output:
[NULL, NULL]
Passing NULL Arguments
Passing a NULL argument yields different results depending on which part of the argument is NULL. In most cases we’ll get NULL. Generally, we’ll get a NULL list item or a NULL result:
SELECT
list_select(['Sky', 'Box', 'Run'], NULL) AS null_indexes,
list_select([NULL, 'Box', 'Run'], [1, 3]) AS null_list_item,
list_select([NULL], [1, 3]) AS null_list_items,
list_select(NULL, [1, 3]) AS null_list;
Output:
+--------------+----------------+-----------------+-----------+
| null_indexes | null_list_item | null_list_items | null_list |
+--------------+----------------+-----------------+-----------+
| NULL | [NULL, Run] | [NULL, NULL] | NULL |
+--------------+----------------+-----------------+-----------+
But we can also get an error. For example, if we specify a NULL index in the second argument:
SELECT list_select(['Sky', 'Box', 'Run'], [NULL]) AS null_index;
Output:
Invalid Input Error:
NULLs are not allowed as list elements in the second input parameter.
This is true even if we also specify a non-NULL element:
SELECT list_select(['Sky', 'Box', 'Run'], [1, NULL]) AS null_index;
Output:
Invalid Input Error:
NULLs are not allowed as list elements in the second input parameter.
An Alias: The array_select()
Function
DuckDB also provides us with an array_select()
function, which is an alias for list_select()
. In other words, they both do the same thing:
SELECT
list_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 4]) AS list_select,
array_select(['Sky', 'Box', 'Run', 'Ant', 'Cam'], [2, 4]) AS array_select;
Result:
+-------------+--------------+
| list_select | array_select |
+-------------+--------------+
| [Box, Ant] | [Box, Ant] |
+-------------+--------------+