The json_transform()
function in DuckDB is a handy tool for converting JSON strings into structured data types like STRUCT
, MAP
, and LIST
. This allows you to directly query and manipulate nested JSON data using standard SQL, making it much easier to work with complex JSON objects and arrays.
Think of it as a way to cast your JSON data into a more usable, typed format within your database.
Syntax
The basic syntax is as follows:
json_transform(json, structure)
Where json
is the JSON document and structure
is the structure you want it transformed to.
A more detailed depiction might look something like this:
json_transform(json_string, '{"key1": "type1", "key2": "type2", ...}')
json_string
: This is the JSON data you want to transform. It can be a column containing JSON strings or a literal JSON string.'{"key1": "type1", ...}'
: This is a string that defines the structure of the output. The keys in this structure definition must match the keys in your JSON data, and the values specify the desired SQL data type for each key.
You can also use its alias, from_json()
, which works identically.
Examples
Let’s look at a few examples to see how it works in practice.
Example 1: Transforming a Simple JSON Object
Imagine we have a table called users
with a data
column containing JSON objects like this:
{"name": "Jake Sandilands", "age": 30, "is_active": true}
We can use json_transform()
to convert this JSON into a STRUCT
with typed columns:
CREATE TABLE users (data JSON);
INSERT INTO users VALUES ('{"name": "Jake Sandilands", "age": 30, "is_active": true}');
SELECT
json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;
Result:
+---------------------------------------------------------+
| user_data |
+---------------------------------------------------------+
| {'name': Jake Sandilands, 'age': 30, 'is_active': true} |
+---------------------------------------------------------+
The output is a STRUCT
where we can access each field by its key.
Let’s use the typeof()
function to confirm the return data type:
SELECT
typeof(json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}')) AS return_type
FROM users;
Result:
+--------------------------------------------------------+
| return_type |
+--------------------------------------------------------+
| STRUCT("name" VARCHAR, age INTEGER, is_active BOOLEAN) |
+--------------------------------------------------------+
It’s a STRUCT
as expected.
So we can then query the individual fields of the STRUCT
as if they were regular columns:
SELECT
user_data.name,
user_data.age
FROM (
SELECT
json_transform(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users
);
Result:
+-----------------+-----+
| name | age |
+-----------------+-----+
| Jake Sandilands | 30 |
+-----------------+-----+
Example 2: Working with Nested JSON and Arrays
json_transform()
can also handle more complex structures, including nested objects and arrays.
Consider this JSON object with a nested address
object and an array of hobbies
:
{
"name": "Katie Melon",
"address": {
"city": "New York",
"zip": "10001"
},
"hobbies": ["reading", "hiking", "coding"]
}
We can transform this using the following query:
SET VARIABLE json_doc = '{
"name": "Katie Melon",
"address": {
"city": "New York",
"zip": "10001"
},
"hobbies": ["reading", "hiking", "coding"]
}';
SELECT
json_transform(
getvariable('json_doc'),
'{"name": "VARCHAR", "address": "STRUCT(city VARCHAR, zip VARCHAR)", "hobbies": "VARCHAR[]"}'
) AS user_profile;
Result:
+----------------------------------------------------------------------------------------------------------+
| user_profile |
+----------------------------------------------------------------------------------------------------------+
| {'name': Katie Melon, 'address': {'city': New York, 'zip': 10001}, 'hobbies': [reading, hiking, coding]} |
+----------------------------------------------------------------------------------------------------------+
This produced a STRUCT
with a nested STRUCT
for the address and a LIST
for the hobbies. We can use typeof()
to verify:
SET VARIABLE json_doc = '{
"name": "Katie Melon",
"address": {
"city": "New York",
"zip": "10001"
},
"hobbies": ["reading", "hiking", "coding"]
}';
SELECT
typeof(
json_transform(
getvariable('json_doc'),
'{"name": "VARCHAR", "address": "STRUCT(city VARCHAR, zip VARCHAR)", "hobbies": "VARCHAR[]"}'
)
) AS return_type;
Output:
+--------------------------------------------------------------------------------------+
| return_type |
+--------------------------------------------------------------------------------------+
| STRUCT("name" VARCHAR, address STRUCT(city VARCHAR, zip VARCHAR), hobbies VARCHAR[]) |
+--------------------------------------------------------------------------------------+
By transforming our JSON data, we unlock the ability to use DuckDB’s set of functions for STRUCT
s and LIST
s, potentially making our JSON data easier to query.
An Alias: The from_json()
Function
As mentioned, from_json()
is an alias for json_transform()
, so we can use either one to get the same result. Here’s an example:
SELECT
user_data.name,
user_data.age
FROM (
SELECT
from_json(data, '{"name": "VARCHAR", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users
);
Result:
+-----------------+-----+
| name | age |
+-----------------+-----+
| Jake Sandilands | 30 |
+-----------------+-----+
This is the same example from earlier, except that I replaced json_transform()
with from_json()
. It returned the same result, as expected.
More Options: json_transform_strict()
and from_json_strict()
DuckDB also has a json_transform_strict()
function and its alias from_json_strict()
. These functions do the same thing as json_transform()
, except that they throw an error when type casting fails.
To demonstrate this, here’s what json_transform() does when type casting fails:
SELECT
json_transform(data, '{"name": "INTEGER", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;
Output:
+----------------------------------------------+
| user_data |
+----------------------------------------------+
| {'name': NULL, 'age': 30, 'is_active': true} |
+----------------------------------------------+
Here, I specified INTEGER
for the name field. This is a VARCHAR
field and so it can’t be converted to INTEGER
. Therefore, json_transform()
set the value to NULL
.
And here’s what json_transform_strict()
does:
SELECT
json_transform_strict(data, '{"name": "INTEGER", "age": "INTEGER", "is_active": "BOOLEAN"}') AS user_data
FROM users;
Output:
Invalid Input Error:
Failed to cast value to numerical: "Jake Sandilands"
It threw an error instead. This function allows us to perform transformations without fear of inadvertently setting fields to NULL
due to incorrect casting.
Getting the Structure
DuckDB provides a json_structure()
function that enables us to get the structure of a JSON document. We can use this function to work out which structure to pass to json_transform()
. For example, we can do this:
SELECT
json_structure(data)
FROM users;
Output:
+----------------------------------------------------------+
| json_structure("data") |
+----------------------------------------------------------+
| {"name":"VARCHAR","age":"UBIGINT","is_active":"BOOLEAN"} |
+----------------------------------------------------------+
We can use this as our second argument to json_transform()
if we want.
You may assume that we can pass json_structure()
directly to json_transform()
, but that causes an error:
SELECT
json_transform(data, json_structure(data)) AS user_data
FROM users;
Output:
Binder Error:
JSON structure must be a constant!