Using JSON_TYPE() in DuckDB to Get the Data Type of a Given JSON Value

DuckDB’s json_type() function is a useful utility for inspecting JSON data structures. It helps us determine what type of data we’re working with at any level within a JSON document. This function can be handy when we need to validate data types or handle different JSON structures programmatically.

Syntax

The basic syntax goes like this:

json_type(json[, path])
  • json: The JSON data we want to inspect
  • path: (Optional) A path to a specific element within the JSON structure

Return Types

The function returns one of eight possible type identifiers:

  • ARRAY: JSON arrays like [1, 2, 3]
  • BIGINT: Large integers
  • BOOLEAN: True/false values
  • DOUBLE: Floating-point numbers
  • OBJECT: JSON objects with key-value pairs
  • UBIGINT: Unsigned large integers
  • VARCHAR: String values
  • NULL: Null values

If path is a LIST, the result will be LIST of types.

Basic Usage Examples

Let’s start with simple examples to understand how json_type() identifies different data types:

SELECT 
    json_type('"Water"') AS r1,
    json_type('42.5') AS r2,
    json_type('100') AS r3,
    json_type('true') AS r4,
    json_type('null') AS r5,
    json_type('[1, 2, 3, "hello"]') AS r6,
    json_type('{"name": "John", "age": 30}') AS r7;

Result:

+---------+--------+---------+---------+------+-------+--------+
| r1 | r2 | r3 | r4 | r5 | r6 | r7 |
+---------+--------+---------+---------+------+-------+--------+
| VARCHAR | DOUBLE | UBIGINT | BOOLEAN | NULL | ARRAY | OBJECT |
+---------+--------+---------+---------+------+-------+--------+

Removing the Single Quotes

The json_type() function accepts VARCHAR and JSON arguments. It’s possible to remove the single quotes from some of the arguments, but not all. For example, this works:

SELECT 
    json_type(true),
    json_type(null);

Output:

+---------------------------------+-----------------+
| json_type(CAST('t' AS BOOLEAN)) | json_type(NULL) |
+---------------------------------+-----------------+
| BOOLEAN | NULL |
+---------------------------------+-----------------+

But this doesn’t:

SELECT json_type(100);

Output:

Binder Error:
No function matches the given name and argument types 'json_type(INTEGER_LITERAL)'. You might need to add explicit type casts.
Candidate functions:
json_type(VARCHAR) -> VARCHAR
json_type(VARCHAR, VARCHAR) -> VARCHAR
json_type(VARCHAR, VARCHAR[]) -> VARCHAR[]
json_type(JSON) -> VARCHAR
json_type(JSON, VARCHAR) -> VARCHAR
json_type(JSON, VARCHAR[]) -> VARCHAR[]


LINE 1: SELECT json_type(100);
^

Also, JSON strings are surrounded in double quotes, so the following fails:

SELECT json_type('Water');

Result:

Invalid Input Error:
Malformed JSON at byte 0 of input: unexpected character. Input: Water

Using Paths to Inspect Nested Elements

The json_type() function works perfectly when working with complex nested JSON structures. We can specify a path to examine specific elements within the JSON.

We can use json_type() to get the type of all values at the top level:

WITH sample_data AS (
  SELECT '{
    "name": "Frida", 
    "age": 25, 
    "scores": [95, 87, 92],
    "preferences": {
        "theme": "dark",
        "notifications": true
    }, 
    "active": true
    }' as json_col
)
SELECT 
  json_type(json_col, '$.name') as name_type,
  json_type(json_col, '$.age') as age_type,
  json_type(json_col, '$.scores') as scores_type,
  json_type(json_col, '$.preferences') as preferences_type,
  json_type(json_col, '$.active') as active_type
FROM sample_data;

Result:

+-----------+----------+-------------+------------------+-------------+
| name_type | age_type | scores_type | preferences_type | active_type |
+-----------+----------+-------------+------------------+-------------+
| VARCHAR | UBIGINT | ARRAY | OBJECT | BOOLEAN |
+-----------+----------+-------------+------------------+-------------+

We can also dig deeper into nested objects:

WITH sample_data AS (
  SELECT '{
    "name": "Frida", 
    "age": 25, 
    "scores": [95, 87, 92],
    "preferences": {
        "theme": "dark",
        "notifications": true
    }, 
    "active": true
    }' as json_col
)
SELECT 
  json_type(json_col, '$.scores') as scores_type,
  json_type(json_col, '$.scores[0]') as score_0_type,
  json_type(json_col, '$.scores[1]') as score_1_type,
  json_type(json_col, '$.scores[2]') as score_2_type,
  json_type(json_col, '$.preferences') as preferences_type,
  json_type(json_col, '$.preferences.theme') as theme_type,
  json_type(json_col, '$.preferences.notifications') as theme_type
FROM sample_data;

Result:

+-------------+--------------+--------------+--------------+------------------+------------+------------+
| scores_type | score_0_type | score_1_type | score_2_type | preferences_type | theme_type | theme_type |
+-------------+--------------+--------------+--------------+------------------+------------+------------+
| ARRAY | UBIGINT | UBIGINT | UBIGINT | OBJECT | VARCHAR | BOOLEAN |
+-------------+--------------+--------------+--------------+------------------+------------+------------+

Here’s an array that consists of various types:

WITH array_data AS (
  SELECT '[100, "text", true, null, {"key": "value"}]' as json_col
)
SELECT 
  json_type(json_col, '$[0]') as first_element,
  json_type(json_col, '$[1]') as second_element,
  json_type(json_col, '$[2]') as third_element,
  json_type(json_col, '$[3]') as fourth_element,
  json_type(json_col, '$[4]') as fifth_element
FROM array_data;

Result:

+---------------+----------------+---------------+----------------+---------------+
| first_element | second_element | third_element | fourth_element | fifth_element |
+---------------+----------------+---------------+----------------+---------------+
| UBIGINT | VARCHAR | BOOLEAN | NULL | OBJECT |
+---------------+----------------+---------------+----------------+---------------+

Here’s a JSON example with a deeper structure:

WITH nested_data AS (
  SELECT '{
    "user": {
      "profile": {
        "personal": {
          "name": "Bob",
          "contacts": ["[email protected]", "555-1234"]
        }
      }
    }
  }' as json_col
)
SELECT 
  json_type(json_col, '$.user') as user_type,
  json_type(json_col, '$.user.profile') as profile_type,
  json_type(json_col, '$.user.profile.personal.name') as name_type,
  json_type(json_col, '$.user.profile.personal.contacts') as contacts_type,
  json_type(json_col, '$.user.profile.personal.contacts[0]') as first_contact_type
FROM nested_data;

Output:

+-----------+--------------+-----------+---------------+--------------------+
| user_type | profile_type | name_type | contacts_type | first_contact_type |
+-----------+--------------+-----------+---------------+--------------------+
| OBJECT | OBJECT | VARCHAR | ARRAY | VARCHAR |
+-----------+--------------+-----------+---------------+--------------------+

Working with Lists of Paths

When we provide a list of paths, json_type() returns a list of corresponding types.

So if we wanted to put all of the previous example’s JSON types into a list, we could rewrite the query to this:

WITH nested_data AS (
  SELECT '{
    "user": {
      "profile": {
        "personal": {
          "name": "Bob",
          "contacts": ["[email protected]", "555-1234"]
        }
      }
    }
  }' as json_col
)
SELECT 
  json_type(json_col, [
    '$.user',
    '$.user.profile',
    '$.user.profile.personal.name',
    '$.user.profile.personal.contacts',
    '$.user.profile.personal.contacts[0]'
    ]) as all_types
FROM nested_data;

Result:

+-------------------------------------------+
| all_types |
+-------------------------------------------+
| [OBJECT, OBJECT, VARCHAR, ARRAY, VARCHAR] |
+-------------------------------------------+

Handling Non-Existent Paths

When we specify a path that doesn’t exist in the JSON structure, json_type() returns NULL:

SELECT 
    json_type('{"a": 1}', '$.oops') AS r1,
    json_type('[1, 2, 3]', '$[10]')  AS r2;

Output:

+------+------+
| r1 | r2 |
+------+------+
| NULL | NULL |
+------+------+