In DuckDB, the date_diff() (along with its synonym datediff()) and date_sub() (along with its synonym datesub()) functions allow us to get the difference between two dates. While they might seem similar, they actually calculate date differences in distinct ways that are important to understand for accurate data analysis.
Let’s take a look at the difference between these functions.
The Difference in a Nutshell
The functions are defined like this:
date_diff()anddatediff(): The number of partition boundaries between the dates.date_sub()anddatesub(): The number of complete partitions between the dates.
There’s a difference between partition boundaries and complete partitions. Let’s check it out.
date_diff() and datediff()
The date_diff() (and its synonym datediff()) function calculates the difference between two date/time values by counting the number of partition boundaries between them.
Here’s an example that illustrates this concept:
SELECT
date_diff('day', DATE '2023-01-15', DATE '2023-04-01') AS days,
date_diff('month', DATE '2023-01-15', DATE '2023-04-01') AS months;
Output:
+------+--------+
| days | months |
+------+--------+
| 76 | 3 |
+------+--------+
When we specified month, the function returned three, even though the difference between the dates wasn’t actually three full months.
date_sub() and datesub()
The date_sub() function (and its synonym datesub()) counts the number of complete partitions between two dates. This is different from counting boundaries as date_diff() does.
Example:
SELECT
date_sub('day', DATE '2023-01-15', DATE '2023-04-01') AS days,
date_sub('month', DATE '2023-01-15', DATE '2023-04-01') AS months;
Output:
+------+--------+
| days | months |
+------+--------+
| 76 | 2 |
+------+--------+
This function returns two months instead of three like date_diff() returned. That’s because there weren’t three full months, and it only counts full months.
All Functions Compared Together
Let’s put them all together:
SELECT
date_diff('month', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
datediff('month', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
date_sub('month', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
datesub('month', DATE '2023-01-15', DATE '2023-04-01') AS datesub;
Output:
+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 3 | 3 | 2 | 2 |
+-----------+----------+----------+---------+
It’s important to note that not all cases will yield different results. Sometimes all functions will produce the same result. Also, different date parts (year, month, day) will yield different results with the same dates.
For example, if we change the date part to day, they return the same result:
SELECT
date_diff('day', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
datediff('day', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
date_sub('day', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
datesub('day', DATE '2023-01-15', DATE '2023-04-01') AS datesub;
Output:
+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 76 | 76 | 76 | 76 |
+-----------+----------+----------+---------+
Same if we change it to year:
SELECT
date_diff('year', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
datediff('year', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
date_sub('year', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
datesub('year', DATE '2023-01-15', DATE '2023-04-01') AS datesub;
Output:
+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 0 | 0 | 0 | 0 |
+-----------+----------+----------+---------+