DuckDB makes it easy to export query results to JSON format, which can help us integrate database outputs with web applications, APIs, and data processing pipelines. This article explores how to generate JSON files from DuckDB queries.
Basic JSON Export with COPY ... TO
We can export query results to JSON using the COPY ... TO
statement. It’s quite straightforward. Here’s an example:
COPY (SELECT * FROM customers) TO 'output.json' (FORMAT JSON);
This command exports all records from the customers
table to a file named output.json
. Each row becomes a JSON object, and the entire result set is wrapped in a JSON array.
The resulting file might look something like this:
{"customer_id":1,"name":"Bryce Ko","email":"[email protected]","status":"Active","created_date":"2021-10-03"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]","status":"Active","created_date":"2022-11-17"}
{"customer_id":3,"name":"Cathy Lopez","email":"[email protected]","status":"Inactive","created_date":"2019-04-15"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]","status":"Active","created_date":"2020-09-10"}
{"customer_id":5,"name":"Florence Knight","email":"[email protected]","status":"Inactive","created_date":"2020-10-25"}
This JSON file was created from the following table:
+-------------+-----------------+----------------------+----------+--------------+
| customer_id | name | email | status | created_date |
+-------------+-----------------+----------------------+----------+--------------+
| 1 | Bryce Ko | [email protected] | Active | 2021-10-03 |
| 2 | Tai Anders | [email protected] | Active | 2022-11-17 |
| 3 | Cathy Lopez | [email protected] | Inactive | 2019-04-15 |
| 4 | Harry Helia | [email protected] | Active | 2020-09-10 |
| 5 | Florence Knight | [email protected] | Inactive | 2020-10-25 |
+-------------+-----------------+----------------------+----------+--------------+
Exporting Specific Columns
We can control which columns to include in the JSON file:
COPY (
SELECT customer_id, name, email
FROM customers
WHERE status = 'Active'
) TO 'active_customers.json' (FORMAT JSON);
This creates a file called active_customers.json
with the following contents:
{"customer_id":1,"name":"Bryce Ko","email":"[email protected]"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]"}
In this case I narrowed the query to just those rows where the status
column has a value of Active
.
Omitting the Format
It’s possible to omit the FORMAT JSON
part. When we do this, DuckDB detects the format based on the file extension. Given we’re using a .json
file extension, DuckDB will be able to determine that we want the file to be in JSON format:
COPY (SELECT * FROM customers) TO 'output_2.json';
That did the same thing that the first example did. DuckDB was able to figure out that I wanted to export the data in JSON format.
Array Format
We can use ARRAY TRUE
to output the results as a JSON array:
COPY (
SELECT customer_id, name
FROM customers
WHERE status = 'Active'
) TO 'active_customers_array.json' (FORMAT JSON, ARRAY TRUE);
Output:
[
{"customer_id":1,"name":"Bryce Ko"},
{"customer_id":2,"name":"Tai Anders"},
{"customer_id":4,"name":"Harry Helia"}
]
We can explicitly specify ARRAY FALSE
to be sure that it isn’t output as an array:
COPY (
SELECT customer_id, name
FROM customers
WHERE status = 'Active'
) TO 'active_customers_no_array.json' (FORMAT JSON, ARRAY FALSE);
Output:
{"customer_id":1,"name":"Bryce Ko"}
{"customer_id":2,"name":"Tai Anders"}
{"customer_id":4,"name":"Harry Helia"}
Date and Timestamp Formatting
We can control how dates appear in the JSON output:
COPY (
SELECT
customer_id AS id,
strftime(created_date, '%A, %-d %B %Y') AS created_date
FROM customers
) TO 'customers_dates.json' (FORMAT JSON);
Here’s that the resulting JSON file looks like:
{"id":1,"created_date":"Sunday, 3 October 2021"}
{"id":2,"created_date":"Thursday, 17 November 2022"}
{"id":3,"created_date":"Monday, 15 April 2019"}
{"id":4,"created_date":"Thursday, 10 September 2020"}
{"id":5,"created_date":"Sunday, 25 October 2020"}
We can use the same format specifiers that we can use when formatting dates with functions like strftime()
and strptime()
.
NULL Value Handling
DuckDB automatically handles NULL values in JSON export, converting them to JSON null
.
Suppose we insert a new row, but we don’t provide a value for two of the columns, like this:
INSERT INTO customers (customer_id, name, status) VALUES (6, 'Terry Smith', 'Active');
SELECT * FROM customers WHERE customer_id = 6;
Output:
+-------------+-------------+-------+--------+--------------+
| customer_id | name | email | status | created_date |
+-------------+-------------+-------+--------+--------------+
| 6 | Terry Smith | NULL | Active | NULL |
+-------------+-------------+-------+--------+--------------+
We can see that the email
and created_date
columns are NULL.
Now let’s export to JSON:
COPY (
SELECT
customer_id,
name,
COALESCE(email, 'No email available') AS email,
created_date
FROM customers
WHERE status = 'Active'
) TO 'active_customers_including_nulls.json' (FORMAT JSON);
The resulting JSON file looks like this:
{"customer_id":1,"name":"Bryce Ko","email":"[email protected]","created_date":"2021-10-03"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]","created_date":"2022-11-17"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]","created_date":"2020-09-10"}
{"customer_id":6,"name":"Terry Smith","email":"No email available","created_date":null}
The created_date
field is null
for Terry as expected. Regarding his email field, I used the coalesce()
function to present a message whenever there’s a null
value. This meant that email addresses were shown for those customers with email addresses.
Compression
We can compress the resulting JSON file if needed. Actually, DuckDB can detect the compression type from the file extension. For example, if the file extension is .json.gz
then it knows to use gzip
for compression.
Here’s an example of doing that:
COPY (
SELECT
customer_id,
name,
COALESCE(email, 'No email available') AS email,
created_date
FROM customers
WHERE status = 'Active'
) TO 'active_customers_including_nulls.json.gz';
That creates a file called active_customers_including_nulls.json.gz
. When decompressed, a file called active_customers_including_nulls.json
will be created, which is the JSON file that contains the data.
So when we run the code, the following file will be created:
active_customers_including_nulls.json.gz
And when we decompress the file, the following two files will exist:
active_customers_including_nulls.json
active_customers_including_nulls.json.gz
We can also explicitly specify the compression with the COMPRESSION
argument if we want:
COPY (
SELECT
customer_id,
name,
COALESCE(email, 'No email available') AS email,
created_date
FROM customers
WHERE status = 'Active'
) TO 'active_customers_including_nulls.json.gz' (FORMAT JSON, COMPRESSION 'gzip');
The compression options are are none
, gzip
, and zstd
.
By the way, we still need to include the proper extension when using the COMPRESSION
argument if we want our file to be compressed. For example, we can’t omit the .gz
extension and expect the file to be compressed. I included .gz
in the file name in this example. If I omit that extension (i.e., if I try to export to active_customers_including_nulls.json
), then the .json
file is exported but it’s not compressed.