DuckDB has a json_extract() function that extracts JSON data from a JSON document. It enables us to get JSON values from within the JSON document, rather than returning the whole document itself. This article takes a quick look at the function along with some examples of usage.
Syntax
The basic syntax goes like this:
json_extract(json, path)
Where json is the JSON document, and path is the path to the data you want to extract.
There’s also a json_extract_path() function, which is an alias for json_extract(), so we can use either one to achieve the same outcome:
json_extract(json, path)
OR
json_extract_path(json, path)
Quick Example
Here’s a simple example to show how this function works:
SELECT json_extract('{"name": "Kara", "age": 20}', '$.name');
Result:
"Kara"
In this case I returned the value of the name field. To do this I specified a path of $.name. It’s also possible to simply use name as the path:
SELECT json_extract('{"name": "Kara", "age": 20}', 'name');
Result:
"Kara"
Same result.
Note that json_extract() returns JSON, so the string that we extracted is surrounded in quotes (that’s how strings are represented in JSON).
To get the age, we can simply change the path accordingly:
SELECT json_extract('{"name": "Kara", "age": 20}', '$.age');
Result:
20
Numbers aren’t surrounded by double quotes in JSON.
Nested JSON
Most of the time we’ll be using json_extract() on multi-level JSON documents. Let’s extract all top level data from a multi-level JSON document:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
JSON_EXTRACT(getvariable('json_doc'), '$.name') AS name,
JSON_EXTRACT(getvariable('json_doc'), '$.age') AS age,
JSON_EXTRACT(getvariable('json_doc'), '$.contact') AS contact,
JSON_EXTRACT(getvariable('json_doc'), '$.preferences') AS preferences,
JSON_EXTRACT(getvariable('json_doc'), '$.tags') AS tags;
Result:
+----------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| name | age | contact | preferences | tags |
+----------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| "Flex Jackson" | 32 | {"email":"[email protected]","phone":"555-1234"} | {"theme":"dark","notifications":true} | ["developer","premium"] |
+----------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
This example contains nested JSON (some of the fields contain their own JSON).
We can use JSONpath to extract data from the nested JSON:
For example, we can extract the contact details:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
JSON_EXTRACT(getvariable('json_doc'), '$.contact.email') AS email,
JSON_EXTRACT(getvariable('json_doc'), '$.contact.phone') AS phone;
Result:
+--------------------+------------+
| email | phone |
+--------------------+------------+
| "[email protected]" | "555-1234" |
+--------------------+------------+
Or we could dig into the tags:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
JSON_EXTRACT(getvariable('json_doc'), '$.tags[0]') AS "$.tags[0]",
JSON_EXTRACT(getvariable('json_doc'), '$.tags[1]') AS "$.tags[1]";
Result:
+-------------+-----------+
| $.tags[0] | $.tags[1] |
+-------------+-----------+
| "developer" | "premium" |
+-------------+-----------+
JSON arrays are zero-based, and so that’s why we started at zero (i.e. $.tags[0]).
Extracting a String
The json_extract() function extracts JSON. If we want our value to be extracted as a string, we can use json_extract_string() instead:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
JSON_EXTRACT_STRING(getvariable('json_doc'), '$.preferences.theme') AS theme,
JSON_EXTRACT_STRING(getvariable('json_doc'), '$.preferences.notifications') AS notifications;
Result:
+-------+---------------+
| theme | notifications |
+-------+---------------+
| dark | true |
+-------+---------------+
Notice that dark is no longer surrounded by double quotes.
Database Example
Suppose we create and populate the following table:
-- Create the user_data table
CREATE TABLE user_data (
id INTEGER PRIMARY KEY,
username VARCHAR(50) NOT NULL,
profile JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample data with varying JSON structures
INSERT INTO user_data (id, username, profile) VALUES
(1, 'flex_jackson', '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}'),
(2, 'kara_flatch', '{
"name": "Kara Flatch",
"age": 28,
"contact": {
"email": "[email protected]"
},
"preferences": {
"theme": "light"
},
"tags": ["designer", "premium", "beta-tester"]
}'),
(3, 'julio_cash', '{
"name": "Julio Cash",
"age": 41,
"preferences": {
"notifications": false,
"language": "Spanish"
},
"tags": []
}'),
(4, 'emma_wilson', '{
"name": "Emma Wilson",
"contact": {
"phone": "555-5678",
"address": {
"city": "New York",
"zip": "10001"
}
}
}'),
(5, 'homer_griffin', '{
"name": "Homer Griffin",
"inactive": true,
"tags": ["guest"]
}');
The database contains five users, along with some data, such as name, phone, email address, etc. Not all users have the same fields in their JSON.
Let’s return each users’ name, along with their preferences:
SELECT
id,
JSON_EXTRACT_STRING(profile, '$.name') AS name,
JSON_EXTRACT(profile, '$.preferences') AS preferences
FROM user_data;
Result:
+----+---------------+----------------------------------------------+
| id | name | preferences |
+----+---------------+----------------------------------------------+
| 1 | Flex Jackson | {"theme":"dark","notifications":true} |
| 2 | Kara Flatch | {"theme":"light"} |
| 3 | Julio Cash | {"notifications":false,"language":"Spanish"} |
| 4 | Emma Wilson | NULL |
| 5 | Homer Griffin | NULL |
+----+---------------+----------------------------------------------+
Here, I used json_extract_string() to extract the name, and json_extract() to extract the preferences.
The above query returned all users, even those without any preferences. Users without the preferences field got a NULL value instead. Let’s narrow it down to just those with preferences:
SELECT
id,
JSON_EXTRACT_STRING(profile, '$.name') AS name,
JSON_EXTRACT(profile, '$.preferences') AS preferences
FROM user_data
WHERE JSON_EXISTS(profile, '$.preferences');
Result:
+----+--------------+----------------------------------------------+
| id | name | preferences |
+----+--------------+----------------------------------------------+
| 1 | Flex Jackson | {"theme":"dark","notifications":true} |
| 2 | Kara Flatch | {"theme":"light"} |
| 3 | Julio Cash | {"notifications":false,"language":"Spanish"} |
+----+--------------+----------------------------------------------+
In this case we used the json_exists() function to help us narrow it down. This function returns true if the path is in the JSON document, and false if it’s not.
Malformed JSON
The JSON document that we pass to json_extract() must be well formed. If it’s not, or if it’s not even JSON, we’ll get an error:
SELECT JSON_EXTRACT('Jasper', '$.name');
Result:
Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Jasper
Using the json_extract_path() Function
As mentioned, DuckDB also has a json_extract_path() function, which is an alias for json_extract(). This means we can use them interchangeably:
SET VARIABLE json_doc = '{
"name": "Flex Jackson",
"age": 32,
"contact": {
"email": "[email protected]",
"phone": "555-1234"
},
"preferences": {
"theme": "dark",
"notifications": true
},
"tags": ["developer", "premium"]
}';
SELECT
JSON_EXTRACT(getvariable('json_doc'), '$.tags') AS "JSON_EXTRACT",
JSON_EXTRACT_PATH(getvariable('json_doc'), '$.tags') AS "JSON_EXTRACT_PATH";
Result:
+-------------------------+-------------------------+
| JSON_EXTRACT | JSON_EXTRACT_PATH |
+-------------------------+-------------------------+
| ["developer","premium"] | ["developer","premium"] |
+-------------------------+-------------------------+