The Difference Between DATE_DIFF() and DATE_SUB() in DuckDB

In DuckDB, the date_diff() (along with its synonym datediff()) and date_sub() (along with its synonym datesub()) functions allow us to get the difference between two dates. While they might seem similar, they actually calculate date differences in distinct ways that are important to understand for accurate data analysis.

Let’s take a look at the difference between these functions.

The Difference in a Nutshell

The functions are defined like this:

  • date_diff() and datediff(): The number of partition boundaries between the dates.
  • date_sub() and datesub(): The number of complete partitions between the dates.

There’s a difference between partition boundaries and complete partitions. Let’s check it out.

date_diff() and datediff()

The date_diff() (and its synonym datediff()) function calculates the difference between two date/time values by counting the number of partition boundaries between them.

Here’s an example that illustrates this concept:

SELECT 
    date_diff('day', DATE '2023-01-15', DATE '2023-04-01') AS days,
    date_diff('month', DATE '2023-01-15', DATE '2023-04-01') AS months;

Output:

+------+--------+
| days | months |
+------+--------+
| 76 | 3 |
+------+--------+

When we specified month, the function returned three, even though the difference between the dates wasn’t actually three full months.

date_sub() and datesub()

The date_sub() function (and its synonym datesub()) counts the number of complete partitions between two dates. This is different from counting boundaries as date_diff() does.

Example:

SELECT 
    date_sub('day', DATE '2023-01-15', DATE '2023-04-01') AS days,
    date_sub('month', DATE '2023-01-15', DATE '2023-04-01') AS months;

Output:

+------+--------+
| days | months |
+------+--------+
| 76 | 2 |
+------+--------+

This function returns two months instead of three like date_diff() returned. That’s because there weren’t three full months, and it only counts full months.

All Functions Compared Together

Let’s put them all together:

SELECT 
    date_diff('month', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
    datediff('month', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
    date_sub('month', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
    datesub('month', DATE '2023-01-15', DATE '2023-04-01') AS datesub;

Output:

+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 3 | 3 | 2 | 2 |
+-----------+----------+----------+---------+

It’s important to note that not all cases will yield different results. Sometimes all functions will produce the same result. Also, different date parts (year, month, day) will yield different results with the same dates.

For example, if we change the date part to day, they return the same result:

SELECT 
    date_diff('day', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
    datediff('day', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
    date_sub('day', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
    datesub('day', DATE '2023-01-15', DATE '2023-04-01') AS datesub;

Output:

+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 76 | 76 | 76 | 76 |
+-----------+----------+----------+---------+

Same if we change it to year:

SELECT 
    date_diff('year', DATE '2023-01-15', DATE '2023-04-01') AS date_diff,
    datediff('year', DATE '2023-01-15', DATE '2023-04-01') AS datediff,
    date_sub('year', DATE '2023-01-15', DATE '2023-04-01') AS date_sub,
    datesub('year', DATE '2023-01-15', DATE '2023-04-01') AS datesub;

Output:

+-----------+----------+----------+---------+
| date_diff | datediff | date_sub | datesub |
+-----------+----------+----------+---------+
| 0 | 0 | 0 | 0 |
+-----------+----------+----------+---------+