Understanding the JSON_GROUP_STRUCTURE() Function in DuckDB

The json_group_structure() function in DuckDB is an aggregate function that inspects all JSON values within a group and returns a JSON representation of their structure. It essentially infers a “schema” for the JSON objects in that group. This can be useful for understanding the shape and consistency of your JSON data.

Continue reading

How to Use and Store Dates in SQLite

SQLite is a lightweight, self-contained relational database management system that is widely used due to its simplicity and portability. However, one notable aspect of SQLite is its lack of a dedicated DATE or DATETIME storage class. Instead, SQLite stores date and time values as integers, real numbers, or text, depending on how the developer chooses to manage these values.

In this article, we’ll explore how to effectively use and store dates in SQLite, and provide examples that may help you implement date management in your projects.

Continue reading

3 Easy Ways to Calculate the Median Value from a List in DuckDB

While we might typically use DuckDB’s median() function to get the median value from a column, when it comes to getting the median value from a list, we need to modify this slightly. A list can contain many values, and so we need a way to calculate the median of all of those values, rather than a single value like we’d normally be doing when calculating the median value from a column.

Fortunately, DuckDB provides us with several easy options for achieving this. Below are three options for getting the median value from a list in DuckDB.

Continue reading

How EPOCH_NS() Works in DuckDB

DuckDB provides an epoch_ns() function that returns the Unix epoch time from a given date/time value in nanoseconds.

Unix epoch time is typically expressed as the number of seconds that have elapsed since January 1, 1970 (UTC), but it can also be expressed in other units, such as milliseconds, microseconds, and nanoseconds. DuckDB has specific functions for each of these units, with epoch_ns() being the function that returns it in nanoseconds.

Continue reading

RPAD() Alternative: Applying Right Padding in SQL Server

The SQL rpad() function has been widely implemented across many major RDBMSs, including MySQL, Oracle, PostgreSQL, and MariaDB, to name just a few. But when it comes to SQL Server, we have a problem. SQL Server doesn’t currently provide us with an rpad() function.

But that’s not to say we can’t apply right padding in SQL Server. SQL Server still provides us with enough tools to get the job done. With a bit of work, we can get a similar result to what we might be able to achieve with rpad(). It may not be as elegant as a simple rpad() function, but at least it’s an option.

Continue reading

Using LIST_INTERSECT() to Extract Overlapping Elements from Two Lists in DuckDB

The list_intersect() can be quite a handy function when working with lists in DuckDB. It accepts two lists as arguments, and it returns a list of elements that are common to both lists. It essentially performs a set intersection operation while preserving the list data type.

Duplicate values are ignored, so only one value is returned even if there are multiple occurrences in one or both lists.

Continue reading

An Important Consideration When Choosing Between LIST_CONCAT() and || For Concatenating Lists in DuckDB

When it comes to concatenating lists in DuckDB, we can use the list_concat() function (and its aliases), or the list concatenation operator (||). Often when we have a choice like this, we can use either option interchangeably. But that isn’t quite true in this case.

There’s an important difference between list_concat() and the || operator in DuckDB, and it all comes down to how it handles NULL values.

Continue reading