DuckDB has a json_extract()
function that extracts JSON from a JSON document. DuckDB also has a function called json_extract_string()
that does a similar thing, but extracts its value as a string.
This article takes a look at these two functions side by side.
The Difference
Here’s how the DuckDB documentation describes these two functions:
json_extract()
: ExtractsJSON
fromjson
at the givenpath
. Ifpath
is aLIST
, the result will be aLIST
ofJSON
.json_extract_string()
: ExtractsVARCHAR
fromjson
at the givenpath
. Ifpath
is aLIST
, the result will be aLIST
ofVARCHAR
.
To clarify, both functions extract data from JSON
. It’s just that one returns its result as JSON
while the other returns VARCHAR
.
Example
Here’s an example that demonstrates the difference between the two functions:
.mode duckbox
SELECT
json_extract('{"name": "Ariel", "age": 20}', '$.name') AS json_extract,
json_extract_string('{"name": "Ariel", "age": 20}', '$.name') AS json_extract_string;
Result:
+--------------+---------------------+
| json_extract | json_extract_string |
| json | varchar |
+--------------+---------------------+
| "Ariel" | Ariel |
+--------------+---------------------+
We can see that the value in the left column is surrounded by double quotes, but the value in the right column isn’t. That’s because the left column is JSON
and the right column is VARCHAR
.
In this example I set my DuckDB CLI output mode to duckbox
, so that the data type is displayed in the column header. So this confirms that the left column is JSON
and the right is VARCHAR
.
Another way to find out the data type is with the typeof()
function:
.mode table
SELECT
typeof(json_extract('{"name": "Ariel", "age": 20}', '$.name')) AS json_extract,
typeof(json_extract_string('{"name": "Ariel", "age": 20}', '$.name')) AS json_extract_string;
Result:
+--------------+---------------------+
| json_extract | json_extract_string |
+--------------+---------------------+
| JSON | VARCHAR |
+--------------+---------------------+
Here, I changed to table
mode, so we can no longer see the data type in the column header. Regardless, the typeof()
function returned the data type.
What About Numbers?
JSON numbers aren’t surrounded in double quotes, so they’re returned without them when using both functions. But that’s not to say they’re returned using the same data type:
.mode duckbox
SELECT
json_extract('{"name": "Ariel", "age": 20}', '$.age') AS json_extract,
json_extract_string('{"name": "Ariel", "age": 20}', '$.age') AS json_extract_string;
Result:
+--------------+---------------------+
| json_extract | json_extract_string |
| json | varchar |
+--------------+---------------------+
| 20 | 20 |
+--------------+---------------------+
In fact, they’re returned as JSON
and VARCHAR
, just like with the string example. Yes, even numbers are returned as strings when using json_extract_string()
. This function returns its result as a string.
Summary
So in a nutshell, the difference between the two functions can be summarized as:
json_extract()
extractsJSON
.json_extract_string()
extractsVARCHAR
.
By the way, json_extract()
has an alias called json_extract_path()
(they both do the same thing). And json_extract_path()
has an alias called json_extract_path_text()
.