Understanding the JSON_GROUP_STRUCTURE() Function in DuckDB

The json_group_structure() function in DuckDB is an aggregate function that inspects all JSON values within a group and returns a JSON representation of their structure. It essentially infers a “schema” for the JSON objects in that group. This can be useful for understanding the shape and consistency of your JSON data.

Continue reading

A Quick Look at DuckDB’s JSON_VALUE() Function

DuckDB provides a handful of functions for getting data from JSON documents. We can use them as long as the JSON extension is installed and loaded (which it is in most distributions). One such function for getting data from a JSON document is json_value(). This function extracts scalar data from the specified path in the JSON document. If the value isn’t scalar, then a NULL value is returned.

Continue reading

Understanding JSON_EXTRACT_STRING() in DuckDB

DuckDB has a json_extract_string() function that works similar to json_extract(), except that it returns its result as a string (varchar). The json_extract() function, on the other hand, returns its result as JSON.

The purpose of these two functions is to extract data from a JSON document. We’ll focus on the json_extract_string() function in this article.

Continue reading

DuckDB JSON_EXISTS(): Check if a Path Exists in a JSON Document

Most DuckDB distributions are shipped with the json extension, which enables us to work with JSON documents. DuckDB provides a bunch of JSON functions, including JSON_EXISTS(). The JSON_EXISTS() function allows us to check whether a specific path exists within a JSON document.

This article explores how this function works, along with examples.

Continue reading

How to Output Query Results as JSON in the DuckDB CLI

DuckDB is a lightweight, fast database management system designed for analytics and embedded use cases. Its versatility makes it an excellent choice for developers and data analysts.

One useful feature of DuckDB is the ability to output query results in different formats, such as JSON, directly from the command-line interface (CLI). By default, the DuckDB CLI uses the duckbox output mode for query results (which outputs the results in a table-like format), but we can change that.

In this article, we’ll walk through the steps to output query results as JSON when using the DuckDB CLI.

Continue reading

What is NDJSON?

Newline Delimited JSON (NDJSON) is a specialized data format that provides a simple yet powerful way to handle streaming JSON data. While standard JSON is excellent for representing structured data, NDJSON addresses specific use cases where processing large datasets and streaming data is required.

This article takes a quick look at NDJSON and how it differs from regular JSON.

Continue reading