--- title: "Introduction to phinterval" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{phinterval} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ```{r setup} library(phinterval) library(lubridate, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) library(tidyr, warn.conflicts = FALSE) ``` # Introduction The {phinterval} package extends [{lubridate}](https://lubridate.tidyverse.org/) to support disjoint ("holey") and empty time spans. It implements the `` vector class, a generalization of the standard contiguous ``, which can represent: - **Contiguous spans:** A contiguous interval bounded by a start and end point (e.g., the year 2025). - **Empty spans:** A set containing no time points (e.g., the intersection of your life and Napoleon's). - **Disjoint spans:** A set of multiple time spans separated by gaps (e.g., the days you attended school, excluding weekends and holidays). This package is designed to easily integrate into existing {lubridate} workflows. Any `` vector can be converted to an equivalent `` vector using `as_phinterval()`, and all {phinterval} functions accept either `` or `` inputs. # When Time Isn't Continuous Certain set operations on time spans naturally produce empty or disjoint results, which are difficult to represent using a standard interval. This section illustrates several such edge cases using the months of January and November 2025, along with the full calendar year. ```{r} jan <- interval(ymd("2025-01-01"), ymd("2025-02-01")) nov <- interval(ymd("2025-11-01"), ymd("2025-12-01")) full_2025 <- interval(ymd("2025-01-01"), ymd("2026-01-01")) ``` ## Empty Intersections Because January and November do not overlap, their intersection should contain no time. ```{r} lubridate::intersect(jan, nov) phint_intersect(jan, nov) ``` In {lubridate} this is resolved by coercing the intersection to `NA`, while {phinterval} returns a ``, which explicitly represents an empty span of time. This distinction matters when performing downstream calculations. For example, counting the number of days contained in both January and November: ```{r} lubridate::intersect(jan, nov) / duration(days = 1) phint_intersect(jan, nov) / duration(days = 1) ``` ## Punching Holes in Intervals Next, consider subtracting the month of November from the full year of 2025. ```{r} try(lubridate::setdiff(full_2025, nov)) phint_setdiff(full_2025, nov) ``` The result is two disjoint spans, January through October and December, which can't be represented by a single interval. As a result, {lubridate} raises an error. In {phinterval}, the disjoint span is represented as a single object with an explicit gap. ## Unions of Non-Overlapping Spans Similarly, the union of January and November contains a gap from February to October. ```{r} lubridate::union(jan, nov) phint_union(jan, nov) ``` In this case {lubridate} returns the span from the beginning of January to the end of November, implicitly filling in the gap. The two disjoint months are represented explicitly using {phinterval}. ## Subtracting an Interval from Itself Finally, consider subtracting an interval from itself. Intuitively, this should result in an empty time span. ```{r} lubridate::setdiff(jan, jan) phint_setdiff(jan, jan) ``` In this case, {lubridate} returns the original interval, while {phinterval} returns a ``. # Case Study: Employment History The {phinterval} package is most useful when working with tabular data and vectorized workflows. To illustrate this, we’ll consider an abridged employment history for several characters from the television show *Succession*. ```{r} jobs <- tribble( ~name, ~job_title, ~start, ~end, "Greg", "Mascot", "2018-01-01", "2018-06-03", "Greg", "Executive Assistant", "2018-06-10", "2020-04-01", "Greg", "Chief of Staff", "2020-03-01", "2020-11-28", "Tom", "Chairman", "2019-05-01", "2020-11-10", "Tom", "CEO", "2020-11-10", "2020-12-31", "Shiv", "Political Consultant", "2017-01-01", "2019-04-01" ) ``` Suppose we know that Greg, Tom, and Shiv went on a Christmas vacation in December 2017. ```{r} vacation <- interval(ymd("2017-12-23"), ymd("2017-12-29")) ``` If we want to analyze only the time spent working, and exclude time on vacation, we might try to subtract the `vacation` interval from each span in `jobs`. However, this approach breaks down when the vacation falls strictly within a job interval, as it does for Shiv’s Political Consultant role. ```{r} try( jobs |> mutate( span = interval(start, end), span = setdiff(span, vacation) ) |> select(name, job_title, span) ) ``` Handling this correctly is surprisingly involved. One option is to split Shiv’s job into two rows (one pre-vacation and one post-vacation), breaking the one-row-per-job structure of the data. Another is to represent each job as a list of intervals, complicating downstream analysis. The main purpose of {phinterval} is to avoid these workarounds, by providing drop-in replacements for {lubridate} interval functions. Because {phinterval} functions accept either `` or `` inputs, existing code can typically be adapted by simply replacing a {lubridate} function with its {phinterval} counterpart. ```{r} jobs |> mutate( span = interval(start, end), span = phint_setdiff(span, vacation) ) |> select(name, job_title, span) ``` ## Merging Intervals Suppose we want to analyze only the total time each character spent employed, without distinguishing between individual jobs. This can be done using `phint_squash()`, which aggregates a vector of intervals into a minimal set of non-overlapping spans within a scalar ``. ```{r, include = FALSE} opts <- options(width = 120) ``` ```{r} employment <- jobs |> mutate(span = interval(start, end)) |> group_by(name) |> summarize(employed = phint_squash(span)) employment ``` Notice that: - *Greg* has multiple disjoint employment periods, which are preserved as separate spans within a single `` element. - *Tom* held two back-to-back positions (Chairman followed by CEO), which `phint_squash()` correctly merges into a single contiguous span. The `by` argument of `phint_squash_by()` and `datetime_squash_by()` (which takes `start` and `end` times directly) can be used in place of `dplyr::group_by()`. The example below is equivalent to the previous code but is usually several times faster. ```{r} datetime_squash_by( start = ymd(jobs$start), end = ymd(jobs$end), by = jobs$name ) ``` ```{r, include = FALSE} options(opts) ``` As in `dplyr::summarize()`, the `by` argument can be a vector or data frame to support multiple grouping columns. To return the dataset to a one-row-per-span format, use `phint_unnest()`, which converts each `` element into separate rows: ```{r} employment |> reframe(phint_unnest(employed, key = name)) ``` ## Finding Gaps To analyze periods of unemployment, we need to identify the gaps between employment intervals. The `phint_invert()` function returns the gaps between spans in a ``. ```{r} unemployment <- employment |> mutate( # Find the gaps between jobs unemployed = phint_invert(employed), # Calculate duration of unemployment days_unemployed = unemployed / ddays(1) ) |> select(name, unemployed, days_unemployed) unemployment ``` Greg was unemployed for 7 days between his time as a Mascot and his role as Executive Assistant. Tom and Shiv have no gaps within their respective employment timelines, represented by a ``. # Edge Cases and Gotchas ## Abutting Intervals and Intersection Manipulating abutting intervals (intervals that share an endpoint) can sometimes produce unexpected results. To demonstrate, consider the time within a Monday and Tuesday in November 2025. ```{r} monday <- interval(ymd("2025-11-10"), ymd("2025-11-11")) tuesday <- interval(ymd("2025-11-11"), ymd("2025-11-12")) ``` By default, intervals in `` and `` vectors have inclusive endpoints, meaning that midnight on Monday, November 11th, 2025 falls within both `monday` and `tuesday`: ```{r} midnight_monday <- ymd_hms("2025-11-11 00:00:00") phint_within(midnight_monday, monday) phint_within(midnight_monday, tuesday) ``` As a result, the intersection of `monday` and `tuesday` is an instantaneous interval at `midnight_monday`. ```{r} phint_intersect(monday, tuesday) == as_phinterval(midnight_monday) ``` Perhaps surprisingly, this also means that the intersection of `monday` and its complement is not empty, but consists of the two endpoints of `monday`. ```{r} not_monday <- phint_complement(monday) not_monday phint_intersect(monday, not_monday) ``` The `bounds` argument in functions such as `phint_overlaps()`, `phint_within()`, and `phint_intersect()` controls this behavior. When `bounds = "()"`, endpoints are treated as exclusive: ```{r} phint_overlaps(monday, tuesday, bounds = "()") phint_intersect(monday, tuesday, bounds = "()") ``` With exclusive endpoints, `monday` and `tuesday` no longer overlap, and their intersection is empty. An instantaneous interval `(point, point)` with open bounds is technically defined as an empty interval (e.g. a ``), but for convenience we consider this interval to contain a single point in time, e.g. `point`. With `bounds = "()"`, instants on the endpoint of an interval are outside of the interval, while instants in the middle of an interval are considered to be within it: ```{r} monday_at_9AM <- as_phinterval(ymd_hms("2025-11-10 00:09:00")) phint_within(monday_at_9AM, monday, bounds = "()") phint_within(midnight_monday, monday, bounds = "()") ``` To consider instantaneous intervals as empty, use `phint_discard_instants()` to remove all instants from an interval vector: ```{r} phint <- phint_squash(c(monday_at_9AM, tuesday)) phint phint_discard_instants(phint) ``` ## Instantaneous Intervals and Set Difference Because phinterval elements are composed of non-overlapping, non-adjacent spans, "punching" an instantaneous hole into an interval using `phint_setdiff()` has no effect on the interval. While removing a single point from an interval `[start, end]` would theoretically split it into `[start, point)` and `(point, end]`, in practice these adjacent pieces are immediately merged back together: ```{r} monday_noon <- as_phinterval(ymd_hms("2025-11-10 12:00:00")) monday_lunch_break <- interval( ymd_hms("2025-11-10 12:00:00"), ymd_hms("2025-11-10 13:00:00") ) phint_setdiff(monday, monday_lunch_break) # Removes a non-zero interval phint_setdiff(monday, monday_noon) # Instantaneous - no effect ``` To create gaps, you must remove an interval with non-zero duration. ## Time Zones To ensure that any `` vector can be represented as an equivalent `` vector, the `phinterval()` constructor accepts any time zone permitted by `interval()`, including unrecognized zones. ```{r} intvl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone") phint <- phinterval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone") intvl == phint ``` When a `` with an unrecognized time zone is formatted, its time points are displayed using the UTC time zone: ```{r, include = FALSE} rlang::reset_warning_verbosity("phinterval_warning_unrecognized_tzone") ``` ```{r} print(phint) ``` The `is_recognized_tzone()` function can be used to check whether a time zone is recognized: ```{r} is_recognized_tzone("America/New_York") is_recognized_tzone("nozone") is_recognized_tzone(NA_character_) ``` Some datetime vectors, such as ``, are allowed to have an `NA` time zone. When converted to a ``, the missing time zone is silently replaced with UTC: ```{r} na_zoned <- as.POSIXct("2021-01-01", tz = NA_character_) as_phinterval(na_zoned) ``` Operations that combine two or more interval vectors, such as `phint_union()`, use the time zone of the first argument. If the first argument's time zone is `""` (the user's local time zone), the second argument's time zone is used instead. ```{r} int_est <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "EST") int_utc <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "UTC") int_lcl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "") phint_union(int_est, int_utc) phint_union(int_utc, int_est) phint_union(int_lcl, int_est) ``` ## Comparison with Datetime Vectors Comparison operators (`<=`, `<`, `>`, `>=`, `==`) work in unexpected ways when comparing datetime vectors (``, ``, ``) to `` or `` vectors. For example: ```{r} span <- phinterval(ymd("2000-08-05"), ymd("2000-11-29")) date <- ymd("2021-01-01") span == date ``` For the intended behavior, use `as_phinterval()` to convert datetime vectors into an equivalent `` first. ```{r} span == as_phinterval(date) ```