DuckDB has a json_extract_string()
function that works similar to json_extract()
, except that it returns its result as a string (varchar
). The json_extract()
function, on the other hand, returns its result as JSON.
The purpose of these two functions is to extract data from a JSON document. We’ll focus on the json_extract_string()
function in this article.
Syntax
The basic syntax goes like this:
json_extract_string(json, path)
Where json
is the JSON document, and path
is the path to the data you want to extract.
As it turns out, json_extract_string()
has an alias called json_extract_path_text()
. This means we can use either one to achieve the same outcome. So we can do either of the following:
json_extract_string(json, path)
OR
json_extract_path_text(json, path)
Quick Example
Here’s a simple example to show how this function works:
.mode duckbox
SELECT json_extract_string('{"name": "Bruno", "age": 36}', '$.name') AS extracted_value;
Result:
+-----------------+
| extracted_value |
| varchar |
+-----------------+
| Bruno |
+-----------------+
This query returned the name
field from the JSON document. It returned it as a varchar
. The first line in this example sets my DuckDB CLI output mode to duckbox
, so that the data type is displayed in the column header.
It’s also possible to specify the path of name
instead of $.name
:
SELECT json_extract_string('{"name": "Bruno", "age": 36}', 'name');
Result:
Bruno
Same result.
Numbers
It’s important to note that json_extract_string()
returns a string, even when the value being extracted is a number. Example:
To get the age, we can simply change the path accordingly:
SELECT json_extract_string('{"name": "Kara", "age": 20}', '$.age') AS extracted_value;
Result:
+-----------------+
| extracted_value |
| varchar |
+-----------------+
| 20 |
+-----------------+
Using typeof()
to Check the Return Type
We can use the typeof()
function to check the return type without having to switch to duckbox
mode:
.mode table
SELECT
typeof(json_extract_string('{"name": "Bruno", "age": 36}', '$.name')) AS name,
typeof(json_extract_string('{"name": "Bruno", "age": 36}', '$.age')) AS age;
Result:
+---------+---------+
| name | age |
+---------+---------+
| VARCHAR | VARCHAR |
+---------+---------+
Nested JSON
We’ll often be working with multi-level JSON documents when using json_extract_string()
. Let’s extract all top level data from a multi-level JSON document:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract_string(getvariable('json_doc'), '$.name') AS name,
json_extract_string(getvariable('json_doc'), '$.age') AS age,
json_extract_string(getvariable('json_doc'), '$.contact') AS contact,
json_extract_string(getvariable('json_doc'), '$.preferences') AS preferences,
json_extract_string(getvariable('json_doc'), '$.tags') AS tags;
Result:
+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| name | age | contact | preferences | tags |
+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| Flex Jackson | 32 | {"email":"[email protected]","phone":"555-1234"} | {"theme":"dark","notifications":true} | ["developer","premium"] |
+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
We can dig deeper into those values if needed. For example, we can extract the contact details into their own separate fields:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract_string(getvariable('json_doc'), '$.contact.email') AS email,
json_extract_string(getvariable('json_doc'), '$.contact.phone') AS phone;
Result:
+------------------+----------+
| email | phone |
+------------------+----------+
| [email protected] | 555-1234 |
+------------------+----------+
And we can get the preferences:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract_string(getvariable('json_doc'), '$.preferences.theme') AS theme,
json_extract_string(getvariable('json_doc'), '$.preferences.notifications') AS notifications;
Result:
+-------+---------------+
| theme | notifications |
+-------+---------------+
| dark | true |
+-------+---------------+
We can get the tags too:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract_string(getvariable('json_doc'), '$.tags[0]') AS "$.tags[0]",
json_extract_string(getvariable('json_doc'), '$.tags[1]') AS "$.tags[1]";
Result:
+-----------+-----------+
| $.tags[0] | $.tags[1] |
+-----------+-----------+
| developer | premium |
+-----------+-----------+
JSON arrays are zero-based, and so that’s why we started at zero (i.e. $.tags[0]
).
Extracting JSON
The json_extract_string()
extracts a string. There’s also a json_extract()
function, which extracts the value as JSON. So we can use that function if we want our value to be extracted as JSON:
.mode duckbox
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract(getvariable('json_doc'), '$.preferences.theme') AS theme,
json_extract(getvariable('json_doc'), '$.preferences.notifications') AS notifications;
Result:
+--------+---------------+
| theme | notifications |
| json | json |
+--------+---------------+
| "dark" | true |
+--------+---------------+
Both columns are returned as JSON. Notice that dark
is now surrounded by double quotes. That’s how strings are represented in JSON documents.
Malformed JSON
The JSON document that we pass to json_extract_string()
must be well formed. If it’s not, or if it’s not even JSON, we’ll get an error:
SELECT json_extract_string('Homer', '$.name');
Result:
Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Homer
Using the json_extract_path_text()
Function
As mentioned, DuckDB also has a json_extract_path_text()
function, which is an alias for json_extract_string()
. That means we can use them interchangeably:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract_string(getvariable('json_doc'), '$.tags[0]') AS json_extract_string,
json_extract_path_text(getvariable('json_doc'), '$.tags[0]') AS json_extract_path_text;
Result:
+---------------------+------------------------+
| json_extract_string | json_extract_path_text |
+---------------------+------------------------+
| developer | developer |
+---------------------+------------------------+