Sampling Rows from a Table in DuckDB with the SAMPLE Clause

DuckDB’s SAMPLE clause is a handy feature that allows us to work with a random subset of our data. This is particularly useful when dealing with large datasets where processing the entire dataset might be time-consuming or unnecessary for exploratory data analysis, testing queries, or creating representative samples.

When we use this clause, we can specify the absolute number of rows to return, or a percentage of rows. We also have an option of sampling method to use.

Continue reading

Find Out if a Table is WITHOUT ROWID in SQLite

One of SQLite’s unique features is the WITHOUT ROWID table, which can be used to optimize performance and storage in specific scenarios.

While it’s easy enough to create a WITHOUT ROWID table (just add WITHOUT ROWID to the definition), how to identify a WITHOUT ROWID table might not be so obvious.

In this article, we’ll start by briefly revising what WITHOUT ROWID tables are and how they differ from ordinary tables. Then we’ll look at how to identify these tables by using SQLite’s PRAGMA commands.

Continue reading

Fixing “Conversion Error” When Using COALESCE() in DuckDB

If you’re getting an error that reads “Conversion Error: Could not convert …etc” while using the COALESCE() function in DuckDB, it appears that you’re using arguments with incompatible types.

To fix this issue, try using CAST() or TRY_CAST() to ensure that all arguments are compatible. Alternatively, make sure the arguments to COALESCE() are of the same type (or at least, compatible types).

Continue reading

Using TRY_STRPTIME() to Handle Errors When Constructing Timestamps in DuckDB

If you’ve ever used the strptime() function to create a timestamp in DuckDB, you may be aware that it will return an error if it can’t construct the timestamp from the format string/s provided.

While such an error could be useful in some situations, it could also be annoying in others.

Fortunately, DuckDB also provides the try_strptime() function, which will suppress any error that we might ordinarily get in such cases. This function returns null instead of an error.

Continue reading

Using the AGE() Function to Compare Dates in DuckDB

When working with date and time data in DuckDB, calculating the difference between two dates is a common requirement. Whether we’re determining a person’s age from their birthdate or measuring the duration between two events, DuckDB’s age() function provides a straightforward solution. This function returns an interval representing the difference between two timestamps or dates, making it especially useful for time-based analyses.

In this article, we’ll explore how to use the age() function in DuckDB. We’ll cover its syntax, and provide some simple examples.

Continue reading