Getting Started with twinsvm

twinsvm fits twin support vector machines and provides a standard C-SVC SVM baseline for comparison. Binary fits use two-class factors: level 1 is class B, level 2 is class A. Multiclass fits use one-vs-one majority voting, with ties resolved by the first factor level.

Generate data and fit a twin SVM

library(twinsvm)

set.seed(1)
dat <- gen_moons(100, noise = 0.12)
fit <- tsvm(dat$x, dat$y, kernel = "rbf", gamma = 2, c1 = 0.1, c2 = 0.1)
head(predict(fit, dat$x))
#> [1] B B B B B B
#> Levels: B A
mean(predict(fit, dat$x) == dat$y)
#> [1] 1

Plot the boundary

plot(fit)

For a linear twin SVM, the two fitted planes are drawn as dashed lines.

linear_fit <- tsvm(dat$x, dat$y, kernel = "linear")
plot(linear_fit)

Cross-validation

cv <- cv_tsvm(
  dat$x,
  dat$y,
  c1_grid = c(0.1, 1),
  c2_grid = c(0.1, 1),
  gamma_grid = c(1, 2),
  kernel = "rbf",
  k = 3
)
cv$best_params
#> $c1
#> [1] 1
#> 
#> $c2
#> [1] 1
#> 
#> $gamma
#> [1] 1
plot(cv)

Multiclass

set.seed(4)
x3 <- rbind(
  matrix(rnorm(30, -2, 0.25), ncol = 2),
  cbind(rnorm(15, 2, 0.25), rnorm(15, -2, 0.25)),
  matrix(rnorm(30, 2, 0.25), ncol = 2)
)
y3 <- factor(rep(c("alpha", "beta", "gamma"), each = 15))

multi <- tsvm(x3, y3, kernel = "linear")
head(predict(multi, x3))
#> [1] alpha alpha alpha alpha alpha alpha
#> Levels: alpha beta gamma
head(predict(multi, x3, type = "votes"))
#>      alpha beta gamma
#> [1,]     2    1     0
#> [2,]     2    1     0
#> [3,]     2    1     0
#> [4,]     2    1     0
#> [5,]     2    1     0
#> [6,]     2    1     0
confusion(multi, x3, y3)
#> $table
#>        predicted
#> truth   alpha beta gamma
#>   alpha    15    0     0
#>   beta      0   15     0
#>   gamma     0    0    15
#> 
#> $accuracy
#> [1] 1

Compare with standard SVM

timing <- data.frame(
  n = c(40, 80, 120),
  tsvm_seconds = NA_real_,
  svms_seconds = NA_real_
)

for (i in seq_len(nrow(timing))) {
  set.seed(i)
  d <- gen_moons(timing$n[i], noise = 0.12)
  timing$tsvm_seconds[i] <- system.time(tsvm(d$x, d$y, kernel = "rbf", gamma = 2))[["elapsed"]]
  timing$svms_seconds[i] <- system.time(svms(d$x, d$y, kernel = "rbf", gamma = 2))[["elapsed"]]
}
timing
#>     n tsvm_seconds svms_seconds
#> 1  40            0         0.00
#> 2  80            0         0.00
#> 3 120            0         0.01

The timing table is generated on the machine running this vignette. Kernel twin-SVM forms invert an (n + 1) matrix, so they are meant for small to moderate data.

Visualization

circles <- gen_circles(100, noise = 0.04)
lift_plot(circles$x, circles$y, gamma = 1)

The same data can be shown through the three fitted classifiers in one row.

set.seed(2)
small <- gen_moons(60, noise = 0.1)
compare_methods(small$x, small$y, gamma = 1, c1 = 0.2, c2 = 0.2, cost = 1)

morph_boundary() returns a gganimate object. Rendering is left to the user so package examples stay fast.

anim <- morph_boundary(dat$x, dat$y, param = "gamma", range = c(0.5, 2), kernel = "rbf", n = 5)
class(anim)
#> [1] "gganim"          "ggplot2::ggplot" "ggplot"          "ggplot2::gg"    
#> [5] "S7_object"       "gg"

Validation

The standard SVM baseline is tested against e1071, which is backed by LIBSVM. There is no existing R twin-SVM package to match against, so twin-SVM tests validate plane-distance behavior, nonlinear kernel improvement, and agreement between the least-squares and original QP formulations. The algorithms follow Jayadeva, Khemchandani, and Chandra (2007) and Kumar and Gopal (2009).