DuckDB’s LIMIT and SAMPLE clauses both allow users to work with subsets of data, but they serve fundamentally different purposes and operate on different principles. While LIMIT returns a fixed number of rows from the beginning of a result set, SAMPLE provides a random sample from the dataset.
create query
A Quick Look at LIMIT & OFFSET in DuckDB
Most database management systems (DBMSs) provide us with a means of restricting the number of rows returned by a query to a fixed number of rows, or to a percentage of the data set. In many cases this is done with a LIMIT clause (although some DBMSs provide other methods, such as SQL Server’s TOP clause).
When it comes to DuckDB, the LIMIT clause is what’s implemented for this functionality.
How to Get a Reproducible Result Set When Using the SAMPLE Clause in DuckDB
When working with large datasets in DuckDB, the SAMPLE clause offers an efficient way to query a subset of your data. However, unless you specifically construct your query to get repeatable results, this sampling will return a different set of results each time the query is run.
But we can change that. We can write our query to return the same random result set every time we run it.
This article explores how to achieve consistent, reproducible result sets when using the SAMPLE clause in DuckDB.
How to Output Query Results as Pipe-Separated Lists in the DuckDB CLI
This tutorial walks you through outputting DuckDB query results in list format, which presents each record as a pipe-separated list of values. This format is handy for data processing tasks and when working with tools that expect pipe-delimited input.
Suppressing Query Output in the DuckDB CLI
DuckDB’s command-line interface (CLI) provides a way to suppress query results using the .mode trash command. This feature is particularly useful in scenarios where you need to execute queries but don’t want their results to be displayed.
How to Output Query Results as HTML Tables in the DuckDB CLI
The DuckDB command line interface (CLI) provides us with the ability to output query results in various formats. One of these formatting options is HTML.
This article shows you how to output your DuckDB query results as HTML tables, which can be useful when you need to include the results in web pages or documentation.
Fix “Sample method … cannot be used with a discrete sample count” When Using the SAMPLE Clause in DuckDB
If you’re getting an error that reads something like “Sample method System cannot be used with a discrete sample count” when using the SAMPLE clause in DuckDB, it looks like you’re specifying an invalid sampling method for the context with which you’re using the SAMPLE clause. Perhaps you’re using system or bernoulli, when you should be using reservoir.
Adding Quotes Around Field Values When Outputting DuckDB Query Results as a List
In this article we look at how to automatically enclose values in single quotes when outputting DuckDB query results as a list. Using this method, each value is formatted as SQL literals. This can be useful when you want to escape the values for SQL or for some other purpose.
We also look at how CSV output is sometimes quoted with double quotes, and how we can adjust the results by doing things like removing the headers and changing the list separator.
Sampling Rows from a Table in DuckDB with the SAMPLE Clause
DuckDB’s SAMPLE clause is a handy feature that allows us to work with a random subset of our data. This is particularly useful when dealing with large datasets where processing the entire dataset might be time-consuming or unnecessary for exploratory data analysis, testing queries, or creating representative samples.
When we use this clause, we can specify the absolute number of rows to return, or a percentage of rows. We also have an option of sampling method to use.
DuckDB Allows Queries to Begin with ‘FROM…’
If you find yourself running a lot of ad-hoc queries that start with SELECT * FROM, you might be interested in this interesting feature of DuckDB.
DuckDB allows us to start queries with the FROM clause. When we do this, DuckDB will return all columns without us needing to specify SELECT *.