spacetime representations aren't good—yet
My beliefs can be summarized somewhat succinctly.
We should not limit space-time data to dates or timestamps.
The R ecosystem should always utilize a normalized approach as described above. Further, a representation should use friendly R objects. The friendliest object is a data frame. A new representation should allow context switching between geometries and temporal data. That new representation should always use time-long formats and the geometries should never be repreated.
A spacetime representation should give users complete and total freedom to manipulate their data as they see fit (e.g. dplyr or data.table operations).
The only time to be strict in the format of spacetime data is when statstics are going to be derived from the data.
Background
While implementing emerging hotspot analysis in sfdep
I encountered the need for a formalized spacetime class in R. As my focus in sfdep has been tidyverse-centric functionality, I desired a “tidy” data frame that could be used as a spacetime representation. Moreover, space (in the spacetime representation) should be represented as an sf or sfc object. In sfdep I introduced the new S3 class spacetime
based on Edzer Pebesma’s 2012 article “spacetime: Spatio-Temporal Data in R” and Thomas Lin Pederson’s tidygraph package.
Representations of Spatial Data
Before describing my preferences in a spacetime representation in R, I want to review possible representations of spacetime data.
Pebesma (2012) outlines three tabular representations of spatio-temporal data.
- “Time-wide: Where different columns reflect different moments in time.
- Space-wide: Where different columns reflect different measurement locations or areas.
- Long formats: Where each record reflects a single time and space combination.
The “long format” is what we may consider “tidy” per Wickham (2014). In this case, both time and space are variables with unique combinations as rows.
Pebesma further qualifies spatial data representation into a “sparse grid” and a “full grid.” Say we have a variable X. In a spatio temporal full grid we will store all combinations of time (t) and locations (i) . If Xi is missing at any of those location and time combinations (Xit is missing), the value of X is recorded as a missing value. Whereas in a sparse grid, if there is any missing data, the observation is omitted. Necessarily, in a full grid there will be i x t number of rows. In a sparse grid there will be fewer than i x t rows.
Very recently in an r-spatial blog post, “Vector Data Cubes”, Edzer describes another approach to representing spacetime using a database normalization approach. Database normalization is a process that reduces redundancy by creating a number of smaller tables containing IDs and values. These tables can then be joined only when needed. When we consider spacetime data, we have repeating geometries across time. It is inefficient to to keep multiple copies of the geometry. Instead, we can keep track of the unique ID of a geometry and store the geometry in another table.
sfdep spacetime representation
The spacetime class in sfdep is in essence a database normalization approach (see above blog post). It is implemented with the database normalization approach and the ergonomics of tidygraph in mind.
The objective of the spacetime class in sfdep is to
- allow complete freedom of data manipulation via data.frame objects,
- prevent duplication of geometries,
- and provide leeway in what “time” can be defined as.
Similar to tidygraph, spacetime
provides access to two contexts: data and geometry. The data context is a data frame and the geometry context. These are linked based on a unqie identifie that is present in both contexts.
R code
library(dplyr)
times <- seq(
Sys.time(),
Sys.time() + lubridate::hours(5),
length.out = 5
)
locations <- c("001", "002")
data_context <- tidyr::crossing(
location = locations,
time = times
) |>
mutate(value = rnorm(n())) |>
arrange(location)
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1; sf_use_s2() is TRUE
geometry_context <- st_sfc(
list(st_point(c(0, 1)), st_point(c(1, 1)))
) |>
st_as_sf() |>
mutate(location = c("001", "002"))
Use the spacetime constructor
library(sfdep)
spt <- spacetime(
.data = data_context,
.geometry = geometry_context,
.loc_col = "location",
.time_col = "time"
)
Swap contexts with activate
activate(spt, "geometry")
## spacetime ────
## Context:`geometry`
## 2 locations `location`
## 5 time periods `time`
## ── geometry context ────────────────────────────────────────────────────────────
## Simple feature collection with 2 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 0 ymin: 1 xmax: 1 ymax: 1
## CRS: NA
## x location
## 1 POINT (0 1) 001
## 2 POINT (1 1) 002
One of my very strong beliefs is that temporal data does not, and should not, always be represented as a date or a timestamp. This paradigm is too limiting. What about panel data where you’re measuring cohorts along periods 1 - 10? Should these be represented as dates? No, definitely not. Because of this, sfdep allows you to utilize any numeric column that can be sorted.
Perhaps I’ve just spent too much time listening to ecometricians…
example of using integers
spacetime(
mutate(data_context, period = row_number()),
geometry_context,
.loc_col = "location",
.time_col = "period"
)
## spacetime ────
## Context:`data`
## 2 locations `location`
## 10 time periods `period`
## ── data context ────────────────────────────────────────────────────────────────
## # A tibble: 10 × 4
## location time value period
## * <chr> <dttm> <dbl> <int>
## 1 001 2022-11-07 11:02:33 -1.09 1
## 2 001 2022-11-07 12:17:33 2.11 2
## 3 001 2022-11-07 13:32:33 1.27 3
## 4 001 2022-11-07 14:47:33 -1.23 4
## 5 001 2022-11-07 16:02:33 1.26 5
## 6 002 2022-11-07 11:02:33 0.626 6
## 7 002 2022-11-07 12:17:33 -0.627 7
## 8 002 2022-11-07 13:32:33 0.117 8
## 9 002 2022-11-07 14:47:33 -0.128 9
## 10 002 2022-11-07 16:02:33 0.913 10
Qualifiers
I don’t think my spacetime class is the panacea. I don’t have the technical chops to make a great data format. I also don’t want to have that burden. Additionally, the class is desgned with lattice data in mind. I don’t think it is sufficient for trajectories or point pattern without repeating locations.
There’s a new R package called cubble
for spatio-temporal data. I’ve not explored it. It may be better suited to your tidy-centric spatio-temporal data.