Output Query Results to a JSON File in DuckDB

DuckDB makes it easy to export query results to JSON format, which can help us integrate database outputs with web applications, APIs, and data processing pipelines. This article explores how to generate JSON files from DuckDB queries.

Basic JSON Export with `COPY ... TO`

We can export query results to JSON using the COPY ... TO statement. It’s quite straightforward. Here’s an example:

COPY (SELECT * FROM customers) TO 'output.json' (FORMAT JSON);

This command exports all records from the customers table to a file named output.json. Each row becomes a JSON object, and the entire result set is wrapped in a JSON array.

The resulting file might look something like this:

{"customer_id":1,"name":"Bryce Ko","email":"[email protected]","status":"Active","created_date":"2021-10-03"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]","status":"Active","created_date":"2022-11-17"}
{"customer_id":3,"name":"Cathy Lopez","email":"[email protected]","status":"Inactive","created_date":"2019-04-15"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]","status":"Active","created_date":"2020-09-10"}
{"customer_id":5,"name":"Florence Knight","email":"[email protected]","status":"Inactive","created_date":"2020-10-25"}

This JSON file was created from the following table:

+-------------+-----------------+----------------------+----------+--------------+
| customer_id |      name       |        email         |  status  | created_date |
+-------------+-----------------+----------------------+----------+--------------+
| 1           | Bryce Ko        | [email protected]    | Active   | 2021-10-03   |
| 2           | Tai Anders      | [email protected]      | Active   | 2022-11-17   |
| 3           | Cathy Lopez     | [email protected]    | Inactive | 2019-04-15   |
| 4           | Harry Helia     | [email protected]    | Active   | 2020-09-10   |
| 5           | Florence Knight | [email protected] | Inactive | 2020-10-25   |
+-------------+-----------------+----------------------+----------+--------------+

Exporting Specific Columns

We can control which columns to include in the JSON file:

COPY (
    SELECT customer_id, name, email
    FROM customers 
    WHERE status = 'Active'
) TO 'active_customers.json' (FORMAT JSON);

This creates a file called active_customers.json with the following contents:

{"customer_id":1,"name":"Bryce Ko","email":"[email protected]"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]"}

In this case I narrowed the query to just those rows where the status column has a value of Active.

Omitting the Format

It’s possible to omit the FORMAT JSON part. When we do this, DuckDB detects the format based on the file extension. Given we’re using a .json file extension, DuckDB will be able to determine that we want the file to be in JSON format:

COPY (SELECT * FROM customers) TO 'output_2.json';

That did the same thing that the first example did. DuckDB was able to figure out that I wanted to export the data in JSON format.

Array Format

We can use ARRAY TRUE to output the results as a JSON array:

COPY (
    SELECT customer_id, name
    FROM customers 
    WHERE status = 'Active'
) TO 'active_customers_array.json' (FORMAT JSON, ARRAY TRUE);

Output:

[
	{"customer_id":1,"name":"Bryce Ko"},
	{"customer_id":2,"name":"Tai Anders"},
	{"customer_id":4,"name":"Harry Helia"}
]

We can explicitly specify ARRAY FALSE to be sure that it isn’t output as an array:

COPY (
    SELECT customer_id, name
    FROM customers 
    WHERE status = 'Active'
) TO 'active_customers_no_array.json' (FORMAT JSON, ARRAY FALSE);

Output:

{"customer_id":1,"name":"Bryce Ko"}
{"customer_id":2,"name":"Tai Anders"}
{"customer_id":4,"name":"Harry Helia"}

Date and Timestamp Formatting

We can control how dates appear in the JSON output:

COPY (
    SELECT 
        customer_id AS id,
        strftime(created_date, '%A, %-d %B %Y') AS created_date
    FROM customers
) TO 'customers_dates.json' (FORMAT JSON);

Here’s that the resulting JSON file looks like:

{"id":1,"created_date":"Sunday, 3 October 2021"}
{"id":2,"created_date":"Thursday, 17 November 2022"}
{"id":3,"created_date":"Monday, 15 April 2019"}
{"id":4,"created_date":"Thursday, 10 September 2020"}
{"id":5,"created_date":"Sunday, 25 October 2020"}

We can use the same format specifiers that we can use when formatting dates with functions like strftime() and strptime().

NULL Value Handling

DuckDB automatically handles NULL values in JSON export, converting them to JSON null.

Suppose we insert a new row, but we don’t provide a value for two of the columns, like this:

INSERT INTO customers (customer_id, name, status) VALUES (6, 'Terry Smith', 'Active');
SELECT * FROM customers WHERE customer_id = 6;

Output:

+-------------+-------------+-------+--------+--------------+
| customer_id |    name     | email | status | created_date |
+-------------+-------------+-------+--------+--------------+
| 6           | Terry Smith | NULL  | Active | NULL         |
+-------------+-------------+-------+--------+--------------+

We can see that the email and created_date columns are NULL.

Now let’s export to JSON:

COPY (
    SELECT 
        customer_id,
        name,
        COALESCE(email, 'No email available') AS email,
        created_date
    FROM customers
    WHERE status = 'Active'
) TO 'active_customers_including_nulls.json' (FORMAT JSON);

The resulting JSON file looks like this:

{"customer_id":1,"name":"Bryce Ko","email":"[email protected]","created_date":"2021-10-03"}
{"customer_id":2,"name":"Tai Anders","email":"[email protected]","created_date":"2022-11-17"}
{"customer_id":4,"name":"Harry Helia","email":"[email protected]","created_date":"2020-09-10"}
{"customer_id":6,"name":"Terry Smith","email":"No email available","created_date":null}

The created_date field is null for Terry as expected. Regarding his email field, I used the coalesce() function to present a message whenever there’s a null value. This meant that email addresses were shown for those customers with email addresses.

Compression

We can compress the resulting JSON file if needed. Actually, DuckDB can detect the compression type from the file extension. For example, if the file extension is .json.gz then it knows to use gzip for compression.

Here’s an example of doing that:

COPY (
    SELECT 
        customer_id,
        name,
        COALESCE(email, 'No email available') AS email,
        created_date
    FROM customers
    WHERE status = 'Active'
) TO 'active_customers_including_nulls.json.gz';

That creates a file called active_customers_including_nulls.json.gz. When decompressed, a file called active_customers_including_nulls.json will be created, which is the JSON file that contains the data.

So when we run the code, the following file will be created:

active_customers_including_nulls.json.gz

And when we decompress the file, the following two files will exist:

active_customers_including_nulls.json
active_customers_including_nulls.json.gz

We can also explicitly specify the compression with the COMPRESSION argument if we want:

COPY (
    SELECT 
        customer_id,
        name,
        COALESCE(email, 'No email available') AS email,
        created_date
    FROM customers
    WHERE status = 'Active'
) TO 'active_customers_including_nulls.json.gz' (FORMAT JSON, COMPRESSION 'gzip');

The compression options are are none, gzip, and zstd.

By the way, we still need to include the proper extension when using the COMPRESSION argument if we want our file to be compressed. For example, we can’t omit the .gz extension and expect the file to be compressed. I included .gz in the file name in this example. If I omit that extension (i.e., if I try to export to active_customers_including_nulls.json), then the .json file is exported but it’s not compressed.

Basic JSON Export with COPY ... TO

Exporting Specific Columns

Omitting the Format

Array Format

Date and Timestamp Formatting

NULL Value Handling

Compression

Basic JSON Export with `COPY ... TO`