Understanding DuckDB’s JSON_VALID() Function

Most of the major RDBMSs that support JSON also include a json_valid() function for checking the validity of a JSON document, and DuckDB’s no exception. Like the other RDBMSs, DuckDB’s implementation of json_valid() checks its argument and returns a boolean value to indicate whether or not it’s valid JSON.

Syntax

The syntax goes like this:

json_valid(json)

Where json is the value that we’re checking for JSON validity.

Quick Example

Let’s test a JSON document for its validity:

SELECT json_valid('{"name": "Bruno", "age": 36}');

Output:

true

In this case I passed a valid JSON document and so json_valid() returned true.

Invalid JSON

Now let’s pass invalid JSON:

SELECT json_valid('{"name": "Bruno", 36}');

Output:

false

We got false because the document is not valid JSON. In this case I removed "age:" from the document, and so the value 36 no longer has a field to belong to.

Passing a String

The json_valid() function also accepts varchar values (whether or not they resemble a JSON document). Obviously, if it’s not valid JSON, then it will return false:

SELECT json_valid('Bruno');

Output:

false

But it’s possible that we could provide a varchar value that does resemble a well-formed JSON document, and in that case we’ll get a pass. This could happen if we extract nested JSON from a JSON document using a function like json_extract_string(), which returns its result as varchar:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.contact') AS contact,
    typeof(json_extract_string(getvariable('json_doc'), '$.contact')) AS typeof,
    json_valid(json_extract_string(getvariable('json_doc'), '$.contact')) AS is_valid;

Output:

+-------------------------------------------------+---------+----------+
| contact | typeof | is_valid |
+-------------------------------------------------+---------+----------+
| {"email":"[email protected]","phone":"555-1234"} | VARCHAR | true |
+-------------------------------------------------+---------+----------+

Here, I used the typeof() function to check the return type, and it confirmed that it’s varchar. The json_valid() function returned true anyway, as it accepts varchar when checking JSON validity.

We’ll get the same result if we check the tags:

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.tags') AS tags,
    typeof(json_extract_string(getvariable('json_doc'), '$.tags')) AS typeof,
    json_valid(json_extract_string(getvariable('json_doc'), '$.tags')) AS is_valid;

Output:

+-------------------------+---------+----------+
| tags | typeof | is_valid |
+-------------------------+---------+----------+
| ["developer","premium"] | VARCHAR | true |
+-------------------------+---------+----------+

What if the Value Doesn’t Resemble JSON?

It’s quite possible to get a positive result even if the value doesn’t actually resemble a JSON document. If the value uses the JSON data type, then its considered valid JSON. But if it doesn’t use the JSON type, then it will not be considered valid JSON.

We can demonstrate this by comparing the results we get when using json_extract() verses json_extract_string(). Let’s extract the email address. First we’ll use json_extract_string(), and then we’ll use json_extract().

Here’s json_extract_string():

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.contact.email') AS email,
    typeof(json_extract_string(getvariable('json_doc'), '$.contact.email')) AS typeof,
    json_valid(json_extract_string(getvariable('json_doc'), '$.contact.email')) AS is_valid;

Output:

+------------------+---------+----------+
| email | typeof | is_valid |
+------------------+---------+----------+
| [email protected] | VARCHAR | false |
+------------------+---------+----------+

As expected, it returned false.

Now here’s json_extract():

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract(getvariable('json_doc'), '$.contact.email') AS email,
    typeof(json_extract(getvariable('json_doc'), '$.contact.email')) AS typeof,
    json_valid(json_extract(getvariable('json_doc'), '$.contact.email')) AS is_valid;

Output:

+--------------------+--------+----------+
| email | typeof | is_valid |
+--------------------+--------+----------+
| "[email protected]" | JSON | true |
+--------------------+--------+----------+

This time we got true. That’s because json_extract() returns its value in the JSON data type. So even though it doesn’t resemble a JSON document, its type is JSON and therefore its valid JSON.

Passing a Number

Passing a number causes an issue:

SELECT json_valid(36);

Output:

No function matches the given name and argument types 'json_valid(INTEGER_LITERAL)'. You might need to add explicit type casts.
Candidate functions:
json_valid(VARCHAR) -> BOOLEAN
json_valid(JSON) -> BOOLEAN


LINE 1: SELECT json_valid(36);
^

However, context matters. If the number is stored as a varchar or JSON type, then we shouldn’t get the error. For example, we’ll get a positive result if the value is extracted from a JSON document using json_extract_string():

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract_string(getvariable('json_doc'), '$.age') AS age,
    typeof(json_extract_string(getvariable('json_doc'), '$.age')) AS typeof,
    json_valid(json_extract_string(getvariable('json_doc'), '$.age')) AS is_valid;

Output:

+-----+---------+----------+
| age | typeof | is_valid |
+-----+---------+----------+
| 32 | VARCHAR | true |
+-----+---------+----------+

And the same applies with json_extract():

SET VARIABLE json_doc = '{
    "name": "Flex Jackson",
    "age": 32,
    "contact": {
        "email": "[email protected]",
        "phone": "555-1234"
    },
    "tags": ["developer", "premium"]
}';
SELECT 
    json_extract(getvariable('json_doc'), '$.age') AS age,
    typeof(json_extract(getvariable('json_doc'), '$.age')) AS typeof,
    json_valid(json_extract(getvariable('json_doc'), '$.age')) AS is_valid;

Output:

+-----+--------+----------+
| age | typeof | is_valid |
+-----+--------+----------+
| 32 | JSON | true |
+-----+--------+----------+