Understanding JSON_EXTRACT_STRING() in DuckDB

DuckDB has a json_extract_string() function that works similar to json_extract(), except that it returns its result as a string (varchar). The json_extract() function, on the other hand, returns its result as JSON.

The purpose of these two functions is to extract data from a JSON document. We’ll focus on the json_extract_string() function in this article.

Syntax

The basic syntax goes like this:

json_extract_string(json, path)

Where json is the JSON document, and path is the path to the data you want to extract.

As it turns out, json_extract_string() has an alias called json_extract_path_text(). This means we can use either one to achieve the same outcome. So we can do either of the following:

json_extract_string(json, path)
OR
json_extract_path_text(json, path)

Quick Example

Here’s a simple example to show how this function works:

.mode duckbox
SELECT json_extract_string('{"name": "Bruno", "age": 36}', '$.name') AS extracted_value;

Result:

+-----------------+
| extracted_value |
| varchar |
+-----------------+
| Bruno |
+-----------------+

This query returned the name field from the JSON document. It returned it as a varchar. The first line in this example sets my DuckDB CLI output mode to duckbox, so that the data type is displayed in the column header.

It’s also possible to specify the path of name instead of $.name:

SELECT json_extract_string('{"name": "Bruno", "age": 36}', 'name');

Result:

Bruno

Same result.

Numbers

It’s important to note that json_extract_string() returns a string, even when the value being extracted is a number. Example:

To get the age, we can simply change the path accordingly:

SELECT json_extract_string('{"name": "Kara", "age": 20}', '$.age') AS extracted_value;

Result:

+-----------------+
| extracted_value |
| varchar |
+-----------------+
| 20 |
+-----------------+

Using typeof() to Check the Return Type

We can use the typeof() function to check the return type without having to switch to duckbox mode:

.mode table
SELECT 
    typeof(json_extract_string('{"name": "Bruno", "age": 36}', '$.name')) AS name,
    typeof(json_extract_string('{"name": "Bruno", "age": 36}', '$.age')) AS age;

Result:

+---------+---------+
| name | age |
+---------+---------+
| VARCHAR | VARCHAR |
+---------+---------+

Nested JSON

We’ll often be working with multi-level JSON documents when using json_extract_string(). Let’s extract all top level data from a multi-level JSON document:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.name') AS name,
    json_extract_string(getvariable('json_doc'), '$.age') AS age,
    json_extract_string(getvariable('json_doc'), '$.contact') AS contact,
    json_extract_string(getvariable('json_doc'), '$.preferences') AS preferences,
    json_extract_string(getvariable('json_doc'), '$.tags') AS tags;

Result:

+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| name | age | contact | preferences | tags |
+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+
| Flex Jackson | 32 | {"email":"[email protected]","phone":"555-1234"} | {"theme":"dark","notifications":true} | ["developer","premium"] |
+--------------+-----+-------------------------------------------------+---------------------------------------+-------------------------+

We can dig deeper into those values if needed. For example, we can extract the contact details into their own separate fields:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.contact.email') AS email,
    json_extract_string(getvariable('json_doc'), '$.contact.phone') AS phone;

Result:

+------------------+----------+
| email | phone |
+------------------+----------+
| [email protected] | 555-1234 |
+------------------+----------+

And we can get the preferences:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.preferences.theme') AS theme,
    json_extract_string(getvariable('json_doc'), '$.preferences.notifications') AS notifications;

Result:

+-------+---------------+
| theme | notifications |
+-------+---------------+
| dark | true |
+-------+---------------+

We can get the tags too:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.tags[0]') AS "$.tags[0]",
    json_extract_string(getvariable('json_doc'), '$.tags[1]') AS "$.tags[1]";

Result:

+-----------+-----------+
| $.tags[0] | $.tags[1] |
+-----------+-----------+
| developer | premium |
+-----------+-----------+

JSON arrays are zero-based, and so that’s why we started at zero (i.e. $.tags[0]).

Extracting JSON

The json_extract_string() extracts a string. There’s also a json_extract() function, which extracts the value as JSON. So we can use that function if we want our value to be extracted as JSON:

.mode duckbox
SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract(getvariable('json_doc'), '$.preferences.theme') AS theme,
    json_extract(getvariable('json_doc'), '$.preferences.notifications') AS notifications;

Result:

+--------+---------------+
| theme | notifications |
| json | json |
+--------+---------------+
| "dark" | true |
+--------+---------------+

Both columns are returned as JSON. Notice that dark is now surrounded by double quotes. That’s how strings are represented in JSON documents.

Malformed JSON

The JSON document that we pass to json_extract_string() must be well formed. If it’s not, or if it’s not even JSON, we’ll get an error:

SELECT json_extract_string('Homer', '$.name');

Result:

Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Homer

Using the json_extract_path_text() Function

As mentioned, DuckDB also has a json_extract_path_text() function, which is an alias for json_extract_string(). That means we can use them interchangeably:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.tags[0]') AS json_extract_string,
    json_extract_path_text(getvariable('json_doc'), '$.tags[0]') AS json_extract_path_text;

Result:

+---------------------+------------------------+
| json_extract_string | json_extract_path_text |
+---------------------+------------------------+
| developer | developer |
+---------------------+------------------------+