A Quick Look at DuckDB’s JSON_VALUE() Function

DuckDB provides a handful of functions for getting data from JSON documents. We can use them as long as the JSON extension is installed and loaded (which it is in most distributions). One such function for getting data from a JSON document is json_value(). This function extracts scalar data from the specified path in the JSON document. If the value isn’t scalar, then a NULL value is returned.

Read more

Understanding JSON_EXTRACT_STRING() in DuckDB

DuckDB has a json_extract_string() function that works similar to json_extract(), except that it returns its result as a string (varchar). The json_extract() function, on the other hand, returns its result as JSON.

The purpose of these two functions is to extract data from a JSON document. We’ll focus on the json_extract_string() function in this article.

Read more

Using JSON_EXTRACT() in DuckDB

DuckDB has a json_extract() function that extracts JSON data from a JSON document. It enables us to get JSON values from within the JSON document, rather than returning the whole document itself. This article takes a quick look at the function along with some examples of usage.

Read more

5 Functions that Return the Year from a Date in DuckDB

When working with dates in DuckDB, some common tasks we might need to perform include extracting date parts from a date or timestamp value. For example we might want to extract the year from a date. Fortunately, DuckDB provides us with an abundance of options for doing that.

In this article, we’ll look at five different functions extract the year from a date in DuckDB.

Read more

Examples of EPOCH_US() in DuckDB

DuckDB provides us with a bunch of epoch...() functions that enable us to get the Unix epoch time from a given date/time value. Different functions return their result using different units (for example seconds, milliseconds, etc). The epoch_us() function returns its result in microseconds.

Unix epoch time is typically expressed as the number of seconds that have elapsed since January 1, 1970 (UTC), but epoch_us() function returns the equivalent amount in microseconds.

Read more

How to Get a Reproducible Result Set When Using the SAMPLE Clause in DuckDB

When working with large datasets in DuckDB, the SAMPLE clause offers an efficient way to query a subset of your data. However, unless you specifically construct your query to get repeatable results, this sampling will return a different set of results each time the query is run.

But we can change that. We can write our query to return the same random result set every time we run it.

This article explores how to achieve consistent, reproducible result sets when using the SAMPLE clause in DuckDB.

Read more

A Quick Look at EPOCH_MS() in DuckDB

In DuckDB, the epoch_ms() function serves a dual purpose. It converts timestamp values into Unix epoch time in milliseconds and also performs the reverse operation, transforming Unix epoch time values back into timestamps.

Unix epoch time is typically expressed as the number of seconds that have elapsed since January 1, 1970 (UTC), but this function returns the equivalent amount in milliseconds.

The function is similar to the epoch() function, which returns its result in seconds. However, the epoch() function only works in one direction; it converts a timestamp value to epoch time, but it doesn’t work the other way around like epoch_ms() can.

Read more

Using DuckDB’s FSUM() Function for More Accurate Results

DuckDB has a fsum() function that can be used instead of the regular sum() function in order to get more accurate results. fsum() calculates the sum using a floating point summation method known as Kahan summation (or compensated summation).

This method helps reduce the accumulation of rounding errors that can occur when summing many floating point numbers when using the regular sum() function.

Read more