DuckDB has emerged as a powerful analytical database system designed for fast in-memory data processing. One common analytical task involves calculating time differences between events—whether measuring intervals between transactions, tracking service response times, or analyzing temporal patterns in time series data.
This article provides an exploration of calculating time differences in DuckDB, covering both basic and slightly more advanced techniques.
Understanding DuckDB’s DateTime Types
Before diving into time difference calculations, it’s essential to understand DuckDB’s datetime data types:
DATE: Stores dates without time information (YYYY-MM-DD)TIME: Stores time without date information (HH:MM:SS)TIMESTAMP: Stores both date and time (YYYY-MM-DD HH:MM:SS)TIMESTAMP WITH TIME ZONE: LikeTIMESTAMPbut includes timezone information
These data types provide the foundation for performing temporal calculations in DuckDB.
Basic Time Difference Calculations
Here are a few ways to perform basic time difference calculations in DuckDB:
Using Arithmetic Operators
The simplest way to calculate time differences in DuckDB is using arithmetic operators. When you subtract one timestamp from another, DuckDB returns an INTERVAL type that represents the time difference.
SELECT TIMESTAMP '2023-01-01 12:00:00' - TIMESTAMP '2023-01-01 10:30:00' AS time_difference;
Result:
+-----------------+
| time_difference |
+-----------------+
| 01:30:00 |
+-----------------+
Using the DATE_DIFF() Function
Another way to do it is with the DATE_DIFF() function:
SELECT
DATE_DIFF( 'MINUTES', TIMESTAMP '2023-01-01 10:30:00', TIMESTAMP '2023-01-01 12:00:00') AS minutes_difference,
DATE_DIFF( 'SECONDS', TIMESTAMP '2023-01-01 10:30:00', TIMESTAMP '2023-01-01 12:00:00') AS seconds_difference,
DATE_DIFF( 'MILLISECONDS', TIMESTAMP '2023-01-01 10:30:00', TIMESTAMP '2023-01-01 12:00:00') AS milliseconds_difference,
DATE_DIFF( 'MICROSECONDS', TIMESTAMP '2023-01-01 10:30:00', TIMESTAMP '2023-01-01 12:00:00') AS microseconds_difference;
Result:
+--------------------+--------------------+-------------------------+-------------------------+
| minutes_difference | seconds_difference | milliseconds_difference | microseconds_difference |
+--------------------+--------------------+-------------------------+-------------------------+
| 90 | 5400 | 5400000 | 5400000000 |
+--------------------+--------------------+-------------------------+-------------------------+
This returns a numeric value representing the number of partition boundaries between the date/time values.
Using the DATE_SUB() Function
The DATE_SUB() function is similar to DATE_DIFF(), except that it calculates the number of complete partitions between the dates, whereas DATE_DIFF() calculates the number of partition boundaries.
Here’s an example of both functions:
SELECT
DATE_SUB( 'HOURS', TIME '10:30:00', TIME '12:00:00') AS DATE_SUB,
DATE_DIFF( 'HOURS', TIME '10:30:00', TIME '12:00:00') AS DATE_DIFF;
Result:
+----------+-----------+
| DATE_SUB | DATE_DIFF |
+----------+-----------+
| 1 | 2 |
+----------+-----------+
Seeing as there’s only 1.5 hour’s difference between the two times, DATE_SUB() returned 1 because there is only one complete hour between the two times.
You may have noticed that In this example I switched to a TIME value instead of a TIMESTAMP like in the previous example. Both functions can operate on both data types.
There’s also a DATEDIFF() function, which is a synonym for DATE_DIFF() (they do the same thing), and a DATESUB() function, which is a synonym for DATE_SUB().
Using the AGE() Function
The AGE() function provides another option for calculating the difference between timestamps, returning an interval that represents years, months, days, hours, minutes, and seconds.
SELECT AGE(TIMESTAMP '2023-01-15 14:30:00', TIMESTAMP '2022-12-25 08:15:00') AS age_difference;
Result:
+------------------+
| age_difference |
+------------------+
| 21 days 06:15:00 |
+------------------+
The AGE() function works with DATE and TIMESTAMP values, but not with TIME or INTERVAL values.
Using the EXTRACT() Function
If we want to get a bit more creative, we could use the EXTRACT() function to do something like this:
SELECT EXTRACT('EPOCH' FROM TIMESTAMP '2023-01-01 12:00:00' - TIMESTAMP '2023-01-01 10:30:00') AS seconds_difference;
Result:
+--------------------+
| seconds_difference |
+--------------------+
| 5400.0 |
+--------------------+
Epoch is the number of seconds since 1970-01-01. It’s one of the many date part specifiers that we can use with the EXTRACT() function.
Converting Intervals to Specific Units
Sometimes you may need to express time differences in specific units like hours, minutes, or seconds. DuckDB provides several functions to extract parts from intervals. EXTRACT() is one such function:
WITH time_diff AS (
SELECT TIMESTAMP '2023-01-01 15:30:45' - TIMESTAMP '2023-01-01 10:15:00' AS diff
)
SELECT
diff,
EXTRACT(HOUR FROM diff) AS hours,
EXTRACT(MINUTE FROM diff) AS minutes,
EXTRACT(SECOND FROM diff) AS seconds
FROM time_diff;
Result:
+----------+-------+---------+---------+
| diff | hours | minutes | seconds |
+----------+-------+---------+---------+
| 05:15:45 | 5 | 15 | 45 |
+----------+-------+---------+---------+
Another way to do this is with functions that extract specific components from the interval, such as:
WITH time_diff AS (
SELECT TIMESTAMP '2023-01-01 15:30:45' - TIMESTAMP '2023-01-01 10:15:00' AS diff
)
SELECT
diff,
HOUR(diff) AS hours,
MINUTE(diff) AS minutes,
SECOND(diff) AS seconds
FROM time_diff;
Result:
+----------+-------+---------+---------+
| diff | hours | minutes | seconds |
+----------+-------+---------+---------+
| 05:15:45 | 5 | 15 | 45 |
+----------+-------+---------+---------+
Real-World Examples
Let’s apply these concepts to practical scenarios using sample data in a database.
First, we’ll create a table and populate it with data:
-- Create a sample table
CREATE TABLE events (
event_id INTEGER,
user_id INTEGER,
event_type VARCHAR,
event_timestamp TIMESTAMP
);
-- Insert sample data
INSERT INTO events VALUES
(1, 101, 'login', '2025-01-01 10:15:00'),
(2, 101, 'page_view', '2025-01-01 10:17:30'),
(3, 101, 'checkout', '2025-01-01 10:25:45'),
(4, 101, 'logout', '2025-01-01 10:30:00'),
(5, 102, 'login', '2025-01-01 11:00:00'),
(6, 102, 'page_view', '2025-01-01 11:05:15'),
(7, 102, 'logout', '2025-01-01 11:10:30');
Calculating Session Duration
To calculate the duration of each user session:
SELECT
user_id,
MIN(CASE WHEN event_type = 'login' THEN event_timestamp END) AS login_time,
MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END) AS logout_time,
DATE_DIFF(
'SECONDS',
MIN(CASE WHEN event_type = 'login' THEN event_timestamp END),
MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END)
) / 60 AS session_duration_minutes
FROM events
GROUP BY user_id;
Output:
+---------+---------------------+---------------------+--------------------------+
| user_id | login_time | logout_time | session_duration_minutes |
+---------+---------------------+---------------------+--------------------------+
| 101 | 2025-01-01 10:15:00 | 2025-01-01 10:30:00 | 15.0 |
| 102 | 2025-01-01 11:00:00 | 2025-01-01 11:10:30 | 10.5 |
+---------+---------------------+---------------------+--------------------------+
Time Between Sequential Events
To analyze the time between sequential events, we can use window functions:
WITH events_ordered AS (
SELECT
event_id,
user_id,
event_type,
event_timestamp,
LAG(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp) AS prev_event_time,
LAG(event_type) OVER (PARTITION BY user_id ORDER BY event_timestamp) AS prev_event_type
FROM events
)
SELECT
event_id,
user_id,
event_type,
event_timestamp,
prev_event_type,
prev_event_time,
DATE_DIFF('SECONDS', prev_event_time, event_timestamp) AS seconds_since_prev_event
FROM events_ordered
WHERE prev_event_time IS NOT NULL;
Result:
+----------+---------+------------+---------------------+-----------------+---------------------+--------------------------+
| event_id | user_id | event_type | event_timestamp | prev_event_type | prev_event_time | seconds_since_prev_event |
+----------+---------+------------+---------------------+-----------------+---------------------+--------------------------+
| 2 | 101 | page_view | 2025-01-01 10:17:30 | login | 2025-01-01 10:15:00 | 150.0 |
| 3 | 101 | checkout | 2025-01-01 10:25:45 | page_view | 2025-01-01 10:17:30 | 495.0 |
| 4 | 101 | logout | 2025-01-01 10:30:00 | checkout | 2025-01-01 10:25:45 | 255.0 |
| 6 | 102 | page_view | 2025-01-01 11:05:15 | login | 2025-01-01 11:00:00 | 315.0 |
| 7 | 102 | logout | 2025-01-01 11:10:30 | page_view | 2025-01-01 11:05:15 | 315.0 |
+----------+---------+------------+---------------------+-----------------+---------------------+--------------------------+
Working with Time Buckets
Maybe slightly off-topic for this article, but aggregating data into time buckets is a common analytical task:
SELECT
DATE_TRUNC('hour', event_timestamp) AS hour_bucket,
COUNT(*) AS event_count
FROM events
GROUP BY hour_bucket
ORDER BY hour_bucket;
Result:
+---------------------+-------------+
| hour_bucket | event_count |
+---------------------+-------------+
| 2025-01-01 10:00:00 | 4 |
| 2025-01-01 11:00:00 | 3 |
+---------------------+-------------+
Leap Year Handling
Be cautious with date arithmetic around leap years:
-- Compare February in leap vs. non-leap year
SELECT
DATE_DIFF('day', DATE '2023-02-01', DATE '2023-03-01') AS days_feb_2023,
DATE_DIFF('day', DATE '2024-02-01', DATE '2024-03-01') AS days_feb_2024;
Result:
+---------------+---------------+
| days_feb_2023 | days_feb_2024 |
+---------------+---------------+
| 28 | 29 |
+---------------+---------------+
February has 29 days in a leap year, and 28 days in non-leap years.
Daylight Saving Time Issues
When working with time zones that observe DST, be aware of potential ambiguities. Time differences across DST transitions may not be as expected:
SELECT TIMESTAMP '2023-03-12 03:00:00' AT TIME ZONE 'America/New_York' -
TIMESTAMP '2023-03-12 01:00:00' AT TIME ZONE 'America/New_York' AS time_diff_across_dst;
Result:
+----------------------+
| time_diff_across_dst |
+----------------------+
| 01:00:00 |
+----------------------+
This date just so happens to be the start of daylight savings, and so there’s only one hour’s difference instead of two (which we might’ve expected if we hadn’t been aware of daylight savings).
Here’s the same calculation for the following year:
SELECT TIMESTAMP '2024-03-12 03:00:00' AT TIME ZONE 'America/New_York' -
TIMESTAMP '2024-03-12 01:00:00' AT TIME ZONE 'America/New_York' AS time_diff_across_dst;
Result:
+----------------------+
| time_diff_across_dst |
+----------------------+
| 02:00:00 |
+----------------------+
This time there’s two hours difference, even though I didn’t change the time part (I only changed the year). In 2024, daylight savings started on March 10, and so that’s why we got a different result to the previous example.