Using LIST_INTERSECT() to Extract Overlapping Elements from Two Lists in DuckDB

The list_intersect() can be quite a handy function when working with lists in DuckDB. It accepts two lists as arguments, and it returns a list of elements that are common to both lists. It essentially performs a set intersection operation while preserving the list data type.

Duplicate values are ignored, so only one value is returned even if there are multiple occurrences in one or both lists.

Continue reading

An Important Consideration When Choosing Between LIST_CONCAT() and || For Concatenating Lists in DuckDB

When it comes to concatenating lists in DuckDB, we can use the list_concat() function (and its aliases), or the list concatenation operator (||). Often when we have a choice like this, we can use either option interchangeably. But that isn’t quite true in this case.

There’s an important difference between list_concat() and the || operator in DuckDB, and it all comes down to how it handles NULL values.

Continue reading

A Quick Look at DuckDB’s JSON_VALUE() Function

DuckDB provides a handful of functions for getting data from JSON documents. We can use them as long as the JSON extension is installed and loaded (which it is in most distributions). One such function for getting data from a JSON document is json_value(). This function extracts scalar data from the specified path in the JSON document. If the value isn’t scalar, then a NULL value is returned.

Continue reading

Understanding JSON_EXTRACT_STRING() in DuckDB

DuckDB has a json_extract_string() function that works similar to json_extract(), except that it returns its result as a string (varchar). The json_extract() function, on the other hand, returns its result as JSON.

The purpose of these two functions is to extract data from a JSON document. We’ll focus on the json_extract_string() function in this article.

Continue reading

5 Functions that Return the Year from a Date in DuckDB

When working with dates in DuckDB, some common tasks we might need to perform include extracting date parts from a date or timestamp value. For example we might want to extract the year from a date. Fortunately, DuckDB provides us with an abundance of options for doing that.

In this article, we’ll look at five different functions extract the year from a date in DuckDB.

Continue reading

Examples of EPOCH_US() in DuckDB

DuckDB provides us with a bunch of epoch...() functions that enable us to get the Unix epoch time from a given date/time value. Different functions return their result using different units (for example seconds, milliseconds, etc). The epoch_us() function returns its result in microseconds.

Unix epoch time is typically expressed as the number of seconds that have elapsed since January 1, 1970 (UTC), but epoch_us() function returns the equivalent amount in microseconds.

Continue reading

How to Get a Reproducible Result Set When Using the SAMPLE Clause in DuckDB

When working with large datasets in DuckDB, the SAMPLE clause offers an efficient way to query a subset of your data. However, unless you specifically construct your query to get repeatable results, this sampling will return a different set of results each time the query is run.

But we can change that. We can write our query to return the same random result set every time we run it.

This article explores how to achieve consistent, reproducible result sets when using the SAMPLE clause in DuckDB.

Continue reading

DuckDB JSON_EXISTS(): Check if a Path Exists in a JSON Document

Most DuckDB distributions are shipped with the json extension, which enables us to work with JSON documents. DuckDB provides a bunch of JSON functions, including JSON_EXISTS(). The JSON_EXISTS() function allows us to check whether a specific path exists within a JSON document.

This article explores how this function works, along with examples.

Continue reading