binsegRcpp Efficient implementation of the binary segmentation heuristic algorithm for changepoint detection, using C++ std::multiset. Also contains functions for comparing empirical time complexity to best/worst case.
Installation
install.packages("binsegRcpp") ##OR if(require("remotes"))install.packages("remotes") remotes::install_github("tdhock/binsegRcpp")
Usage
The main function is binseg for which you must at least specify the
first two arguments:
distribution.strspecifies the loss function to minimize.data.vecis a numeric vector of data to segment.
> x <- c(0.1, 0, 1, 1.1, 0.1, 0) > (models.dt <- binsegRcpp::binseg("mean_norm", x)) binary segmentation model: segments end loss validation.loss <int> <int> <num> <num> 1: 1 6 1.348333e+00 0 2: 2 4 1.015000e+00 0 3: 3 2 1.500000e-02 0 4: 4 3 1.000000e-02 0 5: 5 5 5.000000e-03 0 6: 6 1 -3.339343e-16 0
The result above summarizes the data that are computed during the binary segmentation algorithm. It has a special class with dedicated methods:
> class(models.dt) [1] "binsegRcpp" "list" > methods(class="binsegRcpp") [1] coef plot print see '?methods' for accessing help and source code
The coef methods returns a data table of segment means:
> coef(models.dt, segments=2:3) segments start end start.pos end.pos mean <int> <int> <int> <num> <num> <num> 1: 2 1 4 0.5 4.5 0.55 2: 2 5 6 4.5 6.5 0.05 3: 3 1 2 0.5 2.5 0.05 4: 3 3 4 2.5 4.5 1.05 5: 3 5 6 4.5 6.5 0.05
Demo of poisson loss and non-uniform weights:
> data.vec <- c(3,4,10,20) > (fit1 <- binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10))) binary segmentation model: segments end loss validation.loss <int> <int> <num> <num> 1: 1 4 -393.8437 0 2: 2 3 -411.6347 0 3: 3 2 -413.9416 0 4: 4 1 -414.0133 0
Demo of change in mean and variance for normal distribution:
> sim <- function(mu,sigma)rnorm(10000,mu,sigma) > set.seed(1) > data.vec <- c(sim(5,1), sim(0, 5)) > fit <- binsegRcpp::binseg("meanvar_norm", data.vec) > coef(fit, 2L) segments start end start.pos end.pos mean var <int> <int> <int> <num> <num> <num> <num> 1: 2 1 10000 0.5 10000.5 4.99346296 1.024763 2: 2 10001 20000 10000.5 20000.5 -0.02095033 24.538556
Related work
Other implementations of binary segmentation include changepoint::cpt.mean(method=”BinSeg”) (quadratic storage in max number of segments), BinSeg::BinSegModel() (same linear storage as binsegRcpp), and ruptures.Binseg() (unknown storage). Figures comparing the timings.
This version uses the Rcpp/.Call interface whereas the binseg package uses the .C interface.
See branches for variations of the interface to use as test cases in RcppDeepState development.