Extract All Values From a JSON Document With DuckDB’s JSON_TRANSFORM() Function

The json_transform() function in DuckDB is a handy tool for converting JSON strings into structured data types like STRUCT, MAP, and LIST. This allows you to directly query and manipulate nested JSON data using standard SQL, making it much easier to work with complex JSON objects and arrays.

Think of it as a way to cast your JSON data into a more usable, typed format within your database.

Syntax

The basic syntax is as follows:

json_transform(json, structure)

Where json is the JSON document and structure is the structure you want it transformed to.

A more detailed depiction might look something like this:

json_transform(json_string, '{"key1": "type1", "key2": "type2", ...}')
  • json_string: This is the JSON data you want to transform. It can be a column containing JSON strings or a literal JSON string.
  • '{"key1": "type1", ...}': This is a string that defines the structure of the output. The keys in this structure definition must match the keys in your JSON data, and the values specify the desired SQL data type for each key.

You can also use its alias, from_json(), which works identically.

Examples

Let’s look at a few examples to see how it works in practice.

Example 1: Transforming a Simple JSON Object

Imagine we have a table called users with a data column containing JSON objects like this:

{"name": "Jake Sandilands", "age": 30, "is_active": true}

We can use json_transform() to convert this JSON into a STRUCT with typed columns:

CREATE TABLE users (data JSON);
INSERT INTO users VALUES ('{"name": "Jake Sandilands", "age": 30, "is_active": true}');

SELECT
    json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;

Result:

+---------------------------------------------------------+
| user_data |
+---------------------------------------------------------+
| {'name': Jake Sandilands, 'age': 30, 'is_active': true} |
+---------------------------------------------------------+

The output is a STRUCT where we can access each field by its key.

Let’s use the typeof() function to confirm the return data type:

SELECT
    typeof(json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}')) AS return_type
FROM users;

Result:

+--------------------------------------------------------+
| return_type |
+--------------------------------------------------------+
| STRUCT("name" VARCHAR, age INTEGER, is_active BOOLEAN) |
+--------------------------------------------------------+

It’s a STRUCT as expected.

So we can then query the individual fields of the STRUCT as if they were regular columns:

SELECT
    user_data.name,
    user_data.age
FROM (
    SELECT
        json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
    FROM users
);

Result:

+-----------------+-----+
| name | age |
+-----------------+-----+
| Jake Sandilands | 30 |
+-----------------+-----+

Example 2: Working with Nested JSON and Arrays

json_transform() can also handle more complex structures, including nested objects and arrays.

Consider this JSON object with a nested address object and an array of hobbies:

{
  "name": "Katie Melon",
  "address": {
    "city": "New York",
    "zip": "10001"
  },
  "hobbies": ["reading", "hiking", "coding"]
}

We can transform this using the following query:

SET VARIABLE json_doc = '{
  "name": "Katie Melon",
  "address": {
    "city": "New York",
    "zip": "10001"
  },
  "hobbies": ["reading", "hiking", "coding"]
}';

SELECT
    json_transform(
        getvariable('json_doc'),
        '{"name": "VARCHAR", "address": "STRUCT(city VARCHAR, zip VARCHAR)", "hobbies": "VARCHAR[]"}'
    ) AS user_profile;

Result:

+----------------------------------------------------------------------------------------------------------+
| user_profile |
+----------------------------------------------------------------------------------------------------------+
| {'name': Katie Melon, 'address': {'city': New York, 'zip': 10001}, 'hobbies': [reading, hiking, coding]} |
+----------------------------------------------------------------------------------------------------------+

This produced a STRUCT with a nested STRUCT for the address and a LIST for the hobbies. We can use typeof() to verify:

SET VARIABLE json_doc = '{
  "name": "Katie Melon",
  "address": {
    "city": "New York",
    "zip": "10001"
  },
  "hobbies": ["reading", "hiking", "coding"]
}';

SELECT
    typeof(
        json_transform(
            getvariable('json_doc'),
            '{"name": "VARCHAR", "address": "STRUCT(city VARCHAR, zip VARCHAR)", "hobbies": "VARCHAR[]"}'
        )
    ) AS return_type;

Output:

+--------------------------------------------------------------------------------------+
| return_type |
+--------------------------------------------------------------------------------------+
| STRUCT("name" VARCHAR, address STRUCT(city VARCHAR, zip VARCHAR), hobbies VARCHAR[]) |
+--------------------------------------------------------------------------------------+

By transforming our JSON data, we unlock the ability to use DuckDB’s set of functions for STRUCTs and LISTs, potentially making our JSON data easier to query.

An Alias: The from_json() Function

As mentioned, from_json() is an alias for json_transform(), so we can use either one to get the same result. Here’s an example:

SELECT
    user_data.name,
    user_data.age
FROM (
    SELECT
        from_json(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
    FROM users
);

Result:

+-----------------+-----+
| name | age |
+-----------------+-----+
| Jake Sandilands | 30 |
+-----------------+-----+

This is the same example from earlier, except that I replaced json_transform() with from_json(). It returned the same result, as expected.

More Options: json_transform_strict() and from_json_strict()

DuckDB also has a json_transform_strict() function and its alias from_json_strict(). These functions do the same thing as json_transform(), except that they throw an error when type casting fails.

To demonstrate this, here’s what json_transform() does when type casting fails:

SELECT
    json_transform(data, '{"name": "INTEGER", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;

Output:

+----------------------------------------------+
| user_data |
+----------------------------------------------+
| {'name': NULL, 'age': 30, 'is_active': true} |
+----------------------------------------------+

Here, I specified INTEGER for the name field. This is a VARCHAR field and so it can’t be converted to INTEGER. Therefore, json_transform() set the value to NULL.

And here’s what json_transform_strict() does:

SELECT
json_transform_strict(data, '{"name": "INTEGER", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;

Output:

Invalid Input Error:
Failed to cast value to numerical: "Jake Sandilands"

It threw an error instead. This function allows us to perform transformations without fear of inadvertently setting fields to NULL due to incorrect casting.

Getting the Structure

DuckDB provides a json_structure() function that enables us to get the structure of a JSON document. We can use this function to work out which structure to pass to json_transform(). For example, we can do this:

SELECT
    json_structure(data)
FROM users;

Output:

+----------------------------------------------------------+
| json_structure("data") |
+----------------------------------------------------------+
| {"name":"VARCHAR","age":"UBIGINT","is_active":"BOOLEAN"} |
+----------------------------------------------------------+

We can use this as our second argument to json_transform() if we want.

You may assume that we can pass json_structure() directly to json_transform(), but that causes an error:

SELECT
    json_transform(data, json_structure(data)) AS user_data
FROM users;

Output:

Binder Error:
JSON structure must be a constant!