A Quick Look at DuckDB’s JSON_VALUE() Function

DuckDB provides a handful of functions for getting data from JSON documents. We can use them as long as the JSON extension is installed and loaded (which it is in most distributions). One such function for getting data from a JSON document is json_value(). This function extracts scalar data from the specified path in the JSON document. If the value isn’t scalar, then a NULL value is returned.

Let’s take a quick look at the json_value() function in DuckDB.

Syntax

The basic syntax goes like this:

json_value(json, path)

Where json is the JSON document, and path is the path to the data we want to extract.

Quick Example

Here’s a simple example:

SELECT json_value('{"name": "Bruno", "age": 36}', '$.name');

Result:

"Bruno"

In this case I returned the value of the name field. To do this I specified a path of $.name. It’s also possible to simply use name as the path:

SELECT json_value('{"name": "Bruno", "age": 36}', 'name');

Result:

"Bruno"

Same result.

The json_value() function returns JSON, so when we extract strings, they’re surrounded in quotes. This is how strings are represented in JSON.

Let’s get the age:

SELECT json_value('{"name": "Bruno", "age": 36}', '$.age');

Result:

36

Numbers aren’t surrounded by double quotes in JSON.

Non-Existent Values

If we specify a path that doesn’t exist in the JSON document, NULL is returned:

SELECT json_value('{"name": "Bruno", "age": 36}', '$.height');

Result:

NULL

Only Scalar Values are Extracted

The json_value() function only extracts scalar values. If the value isn’t scalar, then NULL is returned.

We can test this by attempting to extract all top level data from a multi-level JSON document:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_value(getvariable('json_doc'), '$.name') AS name,
    json_value(getvariable('json_doc'), '$.age') AS age,
    json_value(getvariable('json_doc'), '$.contact') AS contact,
    json_value(getvariable('json_doc'), '$.preferences') AS preferences,
    json_value(getvariable('json_doc'), '$.tags') AS tags;

Result:

+----------------+-----+---------+-------------+------+
| name | age | contact | preferences | tags |
+----------------+-----+---------+-------------+------+
| "Flex Jackson" | 32 | NULL | NULL | NULL |
+----------------+-----+---------+-------------+------+

The above JSON document contains nested JSON. This means that some of its fields contain their own JSON. These fields aren’t scalar, and so json_value() returns NULL for those fields.

To extract nested JSON, we can use json_extract() or even json_extract_string().

However, we can use json_value() to extract any scalar data from within the nested JSON. For example:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_value(getvariable('json_doc'), '$.contact.email') AS email,
    json_value(getvariable('json_doc'), '$.contact.phone') AS phone;

Result:

+--------------------+------------+
| email | phone |
+--------------------+------------+
| "[email protected]" | "555-1234" |
+--------------------+------------+

Let’s get the scalar values from the tags:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_value(getvariable('json_doc'), '$.tags[0]') AS "$.tags[0]",
    json_value(getvariable('json_doc'), '$.tags[1]') AS "$.tags[1]";

Result:

+-------------+-----------+
| $.tags[0] | $.tags[1] |
+-------------+-----------+
| "developer" | "premium" |
+-------------+-----------+

JSON arrays are zero-based, and so that’s why we started our paths at zero (i.e. $.tags[0]).

Extracting Nested JSON

As we saw earlier, json_value() only extracts scalar values. If we want to extract nested JSON, we can use the json_extract() function:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract(getvariable('json_doc'), '$.preferences') AS preferences,
    json_extract(getvariable('json_doc'), '$.tags') AS tags;

Result:

+---------------------------------------+-------------------------+
| preferences | tags |
+---------------------------------------+-------------------------+
| {"theme":"dark","notifications":true} | ["developer","premium"] |
+---------------------------------------+-------------------------+

This time we got the nested JSON instead of NULL like we did when using json_value().

The json_extract() function returns its result as JSON. If we want it returned as a string, then we can use json_extract_string() (which returns its result as VARCHAR).

Database Example

Suppose we create and populate the following table:

-- Create the user_data table
CREATE TABLE user_data (
    id INTEGER PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    profile JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert sample data with varying JSON structures
INSERT INTO user_data (id, username, profile) VALUES
(1, 'flex_jackson', '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "preferences": {
        "theme": "dark",
        "notifications": true
    },
    "tags": ["developer", "premium"]
}'),
(2, 'kara_flatch', '{
    "name": "Kara Flatch",
    "age": 28,
    "contact": {
        "email": "[email protected]"
    },
    "preferences": {
        "theme": "light"
    },
    "tags": ["designer", "premium", "beta-tester"]
}'),
(3, 'julio_cash', '{
    "name": "Julio Cash",
    "age": 41,
    "preferences": {
        "notifications": false,
        "language": "Spanish"
    },
    "tags": []
}'),
(4, 'emma_wilson', '{
    "name": "Emma Wilson",
    "contact": {
        "phone": "555-5678",
        "address": {
            "city": "New York",
            "zip": "10001"
        }
    }
}'),
(5, 'homer_griffin', '{
    "name": "Homer Griffin",
    "inactive": true,
    "tags": ["guest"]
}');

The database contains five users, along with some data, such as name, phone, email address, etc. Not all users have the same fields in their JSON.

Let’s return each user, along with their various preferences:

SELECT 
    id,
    json_value(profile, '$.name') AS name,
    json_value(profile, '$.preferences.theme') AS theme,
    json_value(profile, '$.preferences.notifications') AS notifications,
    json_value(profile, '$.preferences.language') AS language
FROM user_data;

Result:

+----+-----------------+---------+---------------+-----------+
| id | name | theme | notifications | language |
+----+-----------------+---------+---------------+-----------+
| 1 | "Flex Jackson" | "dark" | true | NULL |
| 2 | "Kara Flatch" | "light" | NULL | NULL |
| 3 | "Julio Cash" | NULL | false | "Spanish" |
| 4 | "Emma Wilson" | NULL | NULL | NULL |
| 5 | "Homer Griffin" | NULL | NULL | NULL |
+----+-----------------+---------+---------------+-----------+

We can see that NULL is returned whenever the path doesn’t exist for a given user. Otherwise, the specified value is returned as expected.

We can use the json_exists() function to filter the query results to just those with a specified field. For example, let’s return only those users who have a theme specified:

SELECT 
    id,
    json_value(profile, '$.name') AS name,
    json_value(profile, '$.preferences.theme') AS theme
FROM user_data
WHERE json_exists(profile, '$.preferences.theme');

Result:

+----+----------------+---------+
| id | name | theme |
+----+----------------+---------+
| 1 | "Flex Jackson" | "dark" |
| 2 | "Kara Flatch" | "light" |
+----+----------------+---------+

The json_exists() function returns true if the path is in the JSON document, and false if it’s not. This caused the query to return only those rows that contain the specified path.

Malformed JSON

The JSON document that we pass to json_value() must be well formed. If it’s not, or if it’s not even JSON, we’ll get an error:

SELECT json_value('Blake', '$.name');

Result:

Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Blake