Arrow, Rust, and cross-language data science tooling

Arrow, Rust, and cross-language tooling

Josiah Parry, Esri

Robin Lovelace (University of Leeds, ITS)

Hey πŸ‘‹πŸΌ

  • I’m Josiah Parry
  • Sr. Product Engineer @ Esri
  • R developer
  • Maintainer of extendR

Speedrun Agenda

  • The status quo
  • Rust and FFI
  • Apache Arrow
  • Why a Rust core
  • Example: ANIME

Tight Coupling

Tight Coupling

Loose Coupling

  • common β€œcore” libraries

Loose Coupling

Pebesma, et al. 2025

Why Rust?

  • Easy(ish) to pick up
  • No dependencies
  • Stupid fast πŸŽοΈπŸ’¨
  • Easy parallelization #rayon
  • Cross-platform πŸ’»

Rust FFI

  • std::ffi provides FFI utilites for C-like types
  • R, Python, and Julia all have C APIs

extendr

extendr

PyO3

PyO3

jlrs

jlrs

Using a Rust Core

Apache Arrow

TL;DR on what it is and why you should care

Tabular Data

Columnar vs. Row-Oriented

Implementations

Status Quo

With Apache Arrow

Arrow for FFI Input & Output

ANIME

Approximate Network Integration, Matching, and Enrichment

ANIME

  • Developed with Dr. Robin Lovelace
  • Road network matching is tough…
  • ANIME recognizes that it will never be perfect
  • Partial matching of lines

ANIME

Repo Structure

.
β”œβ”€β”€ rust/
β”‚   └── Cargo.toml
β”œβ”€β”€ r/
β”‚   └── src/rust/Cargo.toml
└── py/
    └── Cargo.toml

R Bindings

library(sf)
library(anime)

targets <- read_sf("maine-osm-targets.fgb")
sources <- read_sf("maine-tigris-sources.fgb")

matches <- anime(
    source = sources,
    targets = targets,
    distance_tolerance = 10, 
    angle_tolerance = 5
)

Python Bindings

from anime import PyAnime
from geoarrow.rust.io import read_flatgeobuf

target = read_flatgeobuf("maine-osm-targets.fgb")
sources = read_flatgeobuf("maine-tigris-sources.fgb")

anime = PyAnime(
  source = sources.column("").chunk(0), 
  target = target.column("").chunk(0), 
  distance_tolerance = 10, 
  angle_tolerance = 5
)

Identical Results

Arrow FFI Helpers

A peek under the hood

[dependencies]
arrow = "53.0.0"
geoarrow = "0.4.0-beta.3"
extendr-api = "0.8.""
arrow_extendr = "53.0.0"
itertools = "0.12.0"
anime = {git = "https://github.com/josiahparry/anime"}
[dependencies]
arrow = { version = "54.2.1", default-features = false }
geoarrow = { version = "0.4.0-beta.4" }
pyo3 = { version = "0.24.1", features = ["extension-module"] }
pyo3-arrow = "0.8.0"
anime = { path = "../rust" }

TL;DR

  • Write core in Rust
  • Use Apache Arrow for FFI boundary

Thanks πŸ–€