This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via library("mlr3spatiotempcv"), the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user.
In mlr3, dictionaries are used for overview purposes of available methods. The following shows which dictionaries get appended with new entries.
Additional task types:
TaskClassifST
TaskRegrST
mlr_reflections$task_types
#>       type          package          task        learner        prediction
#> 1: classif             mlr3   TaskClassif LearnerClassif PredictionClassif
#> 2: classif mlr3spatiotempcv TaskClassifST LearnerClassif PredictionClassif
#> 3:    regr             mlr3      TaskRegr    LearnerRegr    PredictionRegr
#> 4:    regr mlr3spatiotempcv    TaskRegrST    LearnerRegr    PredictionRegr
#>           measure
#> 1: MeasureClassif
#> 2: MeasureClassif
#> 3:    MeasureRegr
#> 4:    MeasureRegrAdditional column roles:
coordinatesmlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"
#> 
#> $regr_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"Additional resampling methods:
spcv_block
spcv_buffer
spcv_coords
spcv_env
sptcv_cluto
sptcv_cstf
and their respective repeated versions.
as.data.table(mlr_resamplings)
#>                      key                                  params iters
#>  1:            bootstrap                           repeats,ratio    30
#>  2:               custom                                             0
#>  3:                   cv                                   folds    10
#>  4:              holdout                                   ratio     1
#>  5:             insample                                             1
#>  6:                  loo                                            NA
#>  7:          repeated_cv                           repeats,folds   100
#>  8:  repeated_spcv_block folds,repeats,rows,cols,range,selection    10
#>  9: repeated_spcv_coords                           folds,repeats    10
#> 10:    repeated_spcv_env                  folds,repeats,features    10
#> 11: repeated_sptcv_cluto                           folds,repeats    10
#> 12:  repeated_sptcv_cstf                           folds,repeats    10
#> 13:           spcv_block         folds,rows,cols,range,selection    10
#> 14:          spcv_buffer               theRange,spDataType,addBG     0
#> 15:          spcv_coords                                   folds    10
#> 16:             spcv_env                          folds,features    10
#> 17:          sptcv_cluto                                   folds    10
#> 18:           sptcv_cstf                                   folds    10
#> 19:          subsampling                           repeats,ratio    30Additional example tasks:
tsk("ecuador") (spatial, classif)tsk("cookfarm") (spatiotemp, regr)The following table lists all methods implemented in {mlr3spatiotempcv}, their upstream R package and scientific references.
| Literature | Package | Reference | mlr3 Sugar | 
|---|---|---|---|
| Spatial Buffering | blockCV | Valavi et al. (2018) | rsmp("spcv_buffer") | 
| Spatial Blocking | blockCV | Valavi et al. (2018) | rsmp("spcv_block") | 
| Spatial CV | sperrorest | Brenning (2012) | rsmp("spcv_coords") | 
| Environmental Blocking | blockCV | Valavi et al. (2018) | rsmp("spcv_env") | 
| - | - | - | rsmp("sptcv_cluto") | 
| Leave-Location-and-Time-Out | CAST | Meyer et al. (2018) | rsmp("sptcv_cstf") | 
| Spatiotemporal Clustering | skmeans | Zhao and Karypis (2002) | rsmp("repeated_sptcv_cluto") | 
| Repeated Spatial Blocking | blockCV | Valavi et al. (2018) | rsmp("repeated_spcv_block") | 
| Repeated Spatial CV | sperrorest | Brenning (2012) | rsmp("repeated_spcv_coords") | 
| Repeated Env Blocking | blockCV | Valavi et al. (2018) | rsmp("repeated_spcv_env") | 
| - | - | - | rsmp("repeated_sptcv_cluto") | 
| Repeated Leave-Location-and-Time-Out | CAST | Meyer et al. (2018) | | rsmp("repeated_sptcv_cstf") | 
| Repeated Spatiotemporal Clustering | skmeans | Zhao and Karypis (2002) | rsmp("repeated_sptcv_cluto") | 
Brenning, Alexander. 2012. “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. IEEE. https://doi.org/10.1109/igarss.2012.6352393.
Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001.
Valavi, Roozbeh, Jane Elith, Jose J. Lahoz-Monfort, and Gurutzeta Guillera-Arroita. 2018. “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv, June. https://doi.org/10.1101/357798.
Zhao, Ying, and George Karypis. 2002. “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 515–24. http://glaros.dtc.umn.edu/gkhome/node/167.