DuckDB provides a handful of functions for getting data from JSON documents. We can use them as long as the JSON extension is installed and loaded (which it is in most distributions). One such function for getting data from a JSON document is json_value()
. This function extracts scalar data from the specified path in the JSON document. If the value isn’t scalar, then a NULL value is returned.
Let’s take a quick look at the json_value()
function in DuckDB.
Syntax
The basic syntax goes like this:
json_value(json, path)
Where json
is the JSON document, and path
is the path to the data we want to extract.
Quick Example
Here’s a simple example:
SELECT json_value('{"name": "Bruno", "age": 36}', '$.name');
Result:
"Bruno"
In this case I returned the value of the name
field. To do this I specified a path of $.name
. It’s also possible to simply use name
as the path:
SELECT json_value('{"name": "Bruno", "age": 36}', 'name');
Result:
"Bruno"
Same result.
The json_value()
function returns JSON, so when we extract strings, they’re surrounded in quotes. This is how strings are represented in JSON.
Let’s get the age:
SELECT json_value('{"name": "Bruno", "age": 36}', '$.age');
Result:
36
Numbers aren’t surrounded by double quotes in JSON.
Non-Existent Values
If we specify a path that doesn’t exist in the JSON document, NULL
is returned:
SELECT json_value('{"name": "Bruno", "age": 36}', '$.height');
Result:
NULL
Only Scalar Values are Extracted
The json_value()
function only extracts scalar values. If the value isn’t scalar, then NULL
is returned.
We can test this by attempting to extract all top level data from a multi-level JSON document:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_value(getvariable('json_doc'), '$.name') AS name,
json_value(getvariable('json_doc'), '$.age') AS age,
json_value(getvariable('json_doc'), '$.contact') AS contact,
json_value(getvariable('json_doc'), '$.preferences') AS preferences,
json_value(getvariable('json_doc'), '$.tags') AS tags;
Result:
+----------------+-----+---------+-------------+------+
| name | age | contact | preferences | tags |
+----------------+-----+---------+-------------+------+
| "Flex Jackson" | 32 | NULL | NULL | NULL |
+----------------+-----+---------+-------------+------+
The above JSON document contains nested JSON. This means that some of its fields contain their own JSON. These fields aren’t scalar, and so json_value()
returns NULL
for those fields.
To extract nested JSON, we can use json_extract()
or even json_extract_string()
.
However, we can use json_value()
to extract any scalar data from within the nested JSON. For example:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_value(getvariable('json_doc'), '$.contact.email') AS email,
json_value(getvariable('json_doc'), '$.contact.phone') AS phone;
Result:
+--------------------+------------+
| email | phone |
+--------------------+------------+
| "[email protected]" | "555-1234" |
+--------------------+------------+
Let’s get the scalar values from the tags:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_value(getvariable('json_doc'), '$.tags[0]') AS "$.tags[0]",
json_value(getvariable('json_doc'), '$.tags[1]') AS "$.tags[1]";
Result:
+-------------+-----------+
| $.tags[0] | $.tags[1] |
+-------------+-----------+
| "developer" | "premium" |
+-------------+-----------+
JSON arrays are zero-based, and so that’s why we started our paths at zero (i.e. $.tags[0]
).
Extracting Nested JSON
As we saw earlier, json_value()
only extracts scalar values. If we want to extract nested JSON, we can use the json_extract()
function:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
json_extract(getvariable('json_doc'), '$.preferences') AS preferences,
json_extract(getvariable('json_doc'), '$.tags') AS tags;
Result:
+---------------------------------------+-------------------------+
| preferences | tags |
+---------------------------------------+-------------------------+
| {"theme":"dark","notifications":true} | ["developer","premium"] |
+---------------------------------------+-------------------------+
This time we got the nested JSON instead of NULL
like we did when using json_value()
.
The json_extract()
function returns its result as JSON. If we want it returned as a string, then we can use json_extract_string()
(which returns its result as VARCHAR
).
Database Example
Suppose we create and populate the following table:
-- Create the user_data table
CREATE TABLE user_data (
id INTEGER PRIMARY KEY,
username VARCHAR(50) NOT NULL,
profile JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample data with varying JSON structures
INSERT INTO user_data (id, username, profile) VALUES
(1, 'flex_jackson', '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}'),
(2, 'kara_flatch', '{
"name": "Kara Flatch",
"age": 28,
"contact": {
"email": "[email protected]"
},
"preferences": {
"theme": "light"
},
"tags": ["designer", "premium", "beta-tester"]
}'),
(3, 'julio_cash', '{
"name": "Julio Cash",
"age": 41,
"preferences": {
"notifications": false,
"language": "Spanish"
},
"tags": []
}'),
(4, 'emma_wilson', '{
"name": "Emma Wilson",
"contact": {
"phone": "555-5678",
"address": {
"city": "New York",
"zip": "10001"
}
}
}'),
(5, 'homer_griffin', '{
"name": "Homer Griffin",
"inactive": true,
"tags": ["guest"]
}');
The database contains five users, along with some data, such as name, phone, email address, etc. Not all users have the same fields in their JSON.
Let’s return each user, along with their various preferences:
SELECT
id,
json_value(profile, '$.name') AS name,
json_value(profile, '$.preferences.theme') AS theme,
json_value(profile, '$.preferences.notifications') AS notifications,
json_value(profile, '$.preferences.language') AS language
FROM user_data;
Result:
+----+-----------------+---------+---------------+-----------+
| id | name | theme | notifications | language |
+----+-----------------+---------+---------------+-----------+
| 1 | "Flex Jackson" | "dark" | true | NULL |
| 2 | "Kara Flatch" | "light" | NULL | NULL |
| 3 | "Julio Cash" | NULL | false | "Spanish" |
| 4 | "Emma Wilson" | NULL | NULL | NULL |
| 5 | "Homer Griffin" | NULL | NULL | NULL |
+----+-----------------+---------+---------------+-----------+
We can see that NULL
is returned whenever the path doesn’t exist for a given user. Otherwise, the specified value is returned as expected.
We can use the json_exists()
function to filter the query results to just those with a specified field. For example, let’s return only those users who have a theme specified:
SELECT
id,
json_value(profile, '$.name') AS name,
json_value(profile, '$.preferences.theme') AS theme
FROM user_data
WHERE json_exists(profile, '$.preferences.theme');
Result:
+----+----------------+---------+
| id | name | theme |
+----+----------------+---------+
| 1 | "Flex Jackson" | "dark" |
| 2 | "Kara Flatch" | "light" |
+----+----------------+---------+
The json_exists()
function returns true
if the path is in the JSON document, and false
if it’s not. This caused the query to return only those rows that contain the specified path.
Malformed JSON
The JSON document that we pass to json_value()
must be well formed. If it’s not, or if it’s not even JSON, we’ll get an error:
SELECT json_value('Blake', '$.name');
Result:
Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Blake