Most database management systems (DBMSs) provide us with a means of restricting the number of rows returned by a query to a fixed number of rows, or to a percentage of the data set. In many cases this is done with a LIMIT
clause (although some DBMSs provide other methods, such as SQL Server’s TOP
clause).
When it comes to DuckDB, the LIMIT
clause is what’s implemented for this functionality.
DuckDB also provides an OFFSET
clause that allows us to skip a certain number of rows before applying the LIMIT
amount. We can use both of these together to provide pagination type functionality if needed.
How it Works
The LIMIT
clause is an output modifier that restricts the number of rows fetched based on the amount we specify. This can be a fixed number of rows or a a percentage of the data set.
The LIMIT
clause is logically applied at the end of a query. When used with an ORDER BY
clause its results are deterministic, meaning that they will always be the same, given the same input. If we don’t use an ORDER BY
clause, it’s possible that we could get different results each time we run the query.
We have the option of using an OFFSET
clause. This allows us to offset the starting position of the result set. For example, OFFSET 10
will start the results at row 11.
Examples
Here are some simple examples to demonstrate how LIMIT
and OFFSET
work in DuckDB.
Full Result Set
Suppose we run a (non-limited) query that returns the following result set:
SELECT * FROM employees
ORDER BY emp_id;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+-------------------+-------------+----------+------------+
There are nine rows in this table and all nine rows were returned.
Applying the LIMIT
Clause
Now let’s run the same query again, except this time we’ll use the LIMIT
clause:
SELECT * FROM employees
ORDER BY emp_id
LIMIT 3;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+
I specified LIMIT 3
and so it took the top three rows.
Specifying a Percentage
As mentioned, we can also specify a percentage value:
SELECT * FROM employees
ORDER BY emp_id
LIMIT 33%;
Output:
+--------+-------------------+------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
+--------+-------------------+------------+----------+------------+
When working with smaller data sets, a small change in the percentage value can have a big effect on the end result. In our example, increasing the percentage from 33% to 34% can add another row (increasing the row count by 50%):
SELECT * FROM employees
ORDER BY emp_id
LIMIT 34%;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+
Obviously the larger the data set, the less of an effect this will have.
It’s also possible to specify PERCENT
instead of %
:
SELECT * FROM employees
ORDER BY emp_id
LIMIT 34 PERCENT;
Output:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+
The OFFSET
Clause
We can add the OFFSET
clause to our query in order to have the results start in a different position:
SELECT * FROM employees
ORDER BY emp_id
LIMIT 3
OFFSET 3;
Output:
+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
+--------+--------------+-------------+----------+------------+
The way it works is that DuckDB ignores the number rows that we provide to OFFSET
. In this case, I specified OFFSET 3
, and so DuckDB ignored the first three rows. It then presented the subsequent rows, up to the amount specified in the LIMIT
clause (in this case 3).
Using OFFSET
without LIMIT
It’s possible to use the OFFSET
clause without also using the LIMIT
clause:
SELECT * FROM employees
ORDER BY emp_id
OFFSET 3;
Output:
+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
| 8 | Dave Danker | Marketing | 72000.00 | 2022-09-15 |
| 9 | Sally Schmit | Marketing | 82000.00 | 2017-10-07 |
+--------+--------------+-------------+----------+------------+
In this case the result set is offset by the OFFSET
amount, and the remaining rows are returned from the data set.
Applying Expressions
Both LIMIT
and OFFSET
allow us to provide expressions other than just fixed numbers and percentages. For example, arithmetic expressions:
SELECT * FROM employees
ORDER BY emp_id
LIMIT 2 + 2
OFFSET 1 + 2;
Result:
+--------+--------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+--------------+-------------+----------+------------+
| 4 | Hank Heckler | Engineering | 70000.00 | 2021-07-11 |
| 5 | Slash Slater | Engineering | 98000.00 | 2019-11-25 |
| 6 | Amy Amish | Engineering | 87000.00 | 2019-12-14 |
| 7 | Bon Butler | Marketing | 70000.00 | 2023-02-01 |
+--------+--------------+-------------+----------+------------+
We can also provide more complex expressions, such as subqueries:
SELECT * FROM employees
ORDER BY emp_id
LIMIT (
SELECT COUNT(*)
FROM employees
WHERE department = 'Marketing'
);
Result:
+--------+-------------------+-------------+----------+------------+
| emp_id | emp_name | department | salary | hire_date |
+--------+-------------------+-------------+----------+------------+
| 1 | Beavis Blakefield | Sales | 75000.00 | 2022-01-15 |
| 2 | Mahavishnu Mars | Sales | 82000.00 | 2021-06-20 |
| 3 | Veralda Vett | Engineering | 95000.00 | 2020-03-10 |
+--------+-------------------+-------------+----------+------------+
This probably isn’t a very useful example, but at least it demonstrates the concept that we can use more complex expressions with the LIMIT
clause. Basically, this subquery counts the number of employees in the Marketing department, then uses that number to limit the results of the query (which also includes non-Marketing employees).