Looking at the GENERATE_SUBSCRIPTS() Function in DuckDB

The generate_subscripts() function in DuckDB creates a series of index values for accessing elements in an array. It can help us iterate through array indices when we need to work with array elements.

Here’s a quick look at DuckDB’s generate_subscripts() function, along with examples.

What is generate_subscripts()?

generate_subscripts() is a table-generating function that produces a series of integer values representing valid indices for a specific array dimension.

The basic syntax is straightforward:

generate_subscripts(array, dimension)

Where:

  • array is any array expression
  • dimension specifies which dimension of the array to generate indices for (1 for one-dimensional arrays).

Example

Here’s a basic example to demonstrate how it works:

SELECT generate_subscripts(
    array_value('Sun', 'Sky', 'Sea'), 
    1
    );

Result:

+----------------------------------------------------------+
| generate_subscripts(array_value('Sun', 'Sky', 'Sea'), 1) |
+----------------------------------------------------------+
| 1 |
| 2 |
| 3 |
+----------------------------------------------------------+

So we can see that one row is returned for each array element, and the value of each row increments with each element.

Specifying Another Dimension

Here’s an example that uses a two-dimensional array. First, let’s get the subscripts for the first dimension:

SELECT generate_subscripts(
    array_value(
        array_value('Sun', 'Sky', 'Sea'),
        array_value('Tag', 'Rub', 'Tub')
        ), 
    1
    ) AS first_dimension;

Result:

+-----------------+
| first_dimension |
+-----------------+
| 1 |
| 2 |
+-----------------+

Now let’s do the second dimension. To do this, all we need to do is change 1 to 2:

SELECT generate_subscripts(
    array_value(
        array_value('Sun', 'Sky', 'Sea'),
        array_value('Tag', 'Rub', 'Tub')
        ), 
    2
    ) AS second_dimension;

Result:

+------------------+
| second_dimension |
+------------------+
| 1 |
| 2 |
| 3 |
+------------------+

Extracting the Array’s Elements

We can modify our query in order to extract the array’s elements at each corresponding subscript:

SELECT 
    generate_subscripts(a, 1) AS subscript, 
    a[generate_subscripts(a, 1)] AS value
FROM 
    (SELECT array_value('Sky','Sun','Sea','Tag') AS a) t;

Result:

+-----------+-------+
| subscript | value |
+-----------+-------+
| 1 | Sky |
| 2 | Sun |
| 3 | Sea |
| 4 | Tag |
+-----------+-------+

Common Errors

Non-Existent Dimensions

If we specify a dimension that doesn’t exist in the array, an error is returned:

SELECT generate_subscripts(
    array_value('Sun', 'Sky', 'Sea'), 
    2
    );

Result:

Out of Range Error: array_length dimension '2' out of range (min: '1', max: '1')

The same error occurs when we pass a dimension of zero:

SELECT generate_subscripts(
    array_value('Sun', 'Sky', 'Sea'), 
    0
    );

Result:

Out of Range Error: array_length dimension '0' out of range (min: '1', max: '1')

Omitting the Dimension

Omitting the dimension results in an error:

SELECT generate_subscripts(
    array_value('Sun', 'Sky', 'Sea')
    );

Result:

Binder Error: Macro function generate_subscripts(arr, dim) requires 2 positional arguments, but a single positional argument was provided.
LINE 1: SELECT generate_subscripts(
^

Passing a Non-Array

If the first argument is not an array, then an error occurs:

SELECT generate_subscripts(
    'Sun', 
    1
    );

Result:

Binder Error: No function matches the given name and argument types 'array_length(STRING_LITERAL, INTEGER_LITERAL)'. You might need to add explicit type casts.
Candidate functions:
array_length(ANY[]) -> BIGINT
array_length(ANY[], BIGINT) -> BIGINT

LINE 2: 'Sun',
^