A Quick Look at LIMIT & OFFSET in DuckDB

Most database management systems (DBMSs) provide us with a means of restricting the number of rows returned by a query to a fixed number of rows, or to a percentage of the data set. In many cases this is done with a LIMIT clause (although some DBMSs provide other methods, such as SQL Server’s TOP clause).

When it comes to DuckDB, the LIMIT clause is what’s implemented for this functionality.

DuckDB also provides an OFFSET clause that allows us to skip a certain number of rows before applying the LIMIT amount. We can use both of these together to provide pagination type functionality if needed.

How it Works

The LIMIT clause is an output modifier that restricts the number of rows fetched based on the amount we specify. This can be a fixed number of rows or a a percentage of the data set.

The LIMIT clause is logically applied at the end of a query. When used with an ORDER BY clause its results are deterministic, meaning that they will always be the same, given the same input. If we don’t use an ORDER BY clause, it’s possible that we could get different results each time we run the query.

We have the option of using an OFFSET clause. This allows us to offset the starting position of the result set. For example, OFFSET 10 will start the results at row 11.

Examples

Here are some simple examples to demonstrate how LIMIT and OFFSET work in DuckDB.

Full Result Set

Suppose we run a (non-limited) query that returns the following result set:

SELECT * FROM employees
ORDER BY emp_id;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+

There are nine rows in this table and all nine rows were returned.

Applying the LIMIT Clause

Now let’s run the same query again, except this time we’ll use the LIMIT clause:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 3;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+

I specified LIMIT 3 and so it took the top three rows.

Specifying a Percentage

As mentioned, we can also specify a percentage value:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 33%;

Output:

+--------+-------------------+------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
+--------+-------------------+------------+----------+------------+

When working with smaller data sets, a small change in the percentage value can have a big effect on the end result. In our example, increasing the percentage from 33% to 34% can add another row (increasing the row count by 50%):

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 34%;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+

Obviously the larger the data set, the less of an effect this will have.

It’s also possible to specify PERCENT instead of %:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 34 PERCENT;

Output:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+

The OFFSET Clause

We can add the OFFSET clause to our query in order to have the results start in a different position:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 3
OFFSET 3;

Output:

+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
+--------+--------------+-------------+----------+------------+

The way it works is that DuckDB ignores the number rows that we provide to OFFSET. In this case, I specified OFFSET 3, and so DuckDB ignored the first three rows. It then presented the subsequent rows, up to the amount specified in the LIMIT clause (in this case 3).

Using OFFSET without LIMIT

It’s possible to use the OFFSET clause without also using the LIMIT clause:

SELECT * FROM employees 
ORDER BY emp_id
OFFSET 3;

Output:

+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+--------------+-------------+----------+------------+

In this case the result set is offset by the OFFSET amount, and the remaining rows are returned from the data set.

Applying Expressions

Both LIMIT and OFFSET allow us to provide expressions other than just fixed numbers and percentages. For example, arithmetic expressions:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT 2 + 2
OFFSET 1 + 2;

Result:

+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
+--------+--------------+-------------+----------+------------+

We can also provide more complex expressions, such as subqueries:

SELECT * FROM employees 
ORDER BY emp_id
LIMIT (
    SELECT COUNT(*) 
    FROM employees 
    WHERE department = 'Marketing'
    );

Result:

+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+

This probably isn’t a very useful example, but at least it demonstrates the concept that we can use more complex expressions with the LIMIT clause. Basically, this subquery counts the number of employees in the Marketing department, then uses that number to limit the results of the query (which also includes non-Marketing employees).