Contents

  1. Install
  2. Load a dataset
  3. First fit and prediction
  4. Compare kernels
  5. Compare decomposition strategies
  6. Cross-validation loop
  7. Imbalanced data and label noise

1. Install

Clone the repo and activate the package environment. WRSVM.jl is pure Julia and does not need Python.

using Pkg
Pkg.activate("path/to/WRSVM/WRSVM.jl")
Pkg.instantiate()

# alternatively, add as a dependency of your own project:
# Pkg.add(url = "https://github.com/annicenajafi/WRSVM", subdir = "WRSVM.jl")

2. Load a dataset

We use Fisher’s Iris dataset from RDatasets. Install it once with Pkg.add("RDatasets") if it isn’t already available.

using WRSVM
using RDatasets, Random, Statistics

Random.seed!(0)

iris = dataset("datasets", "iris")
X = Matrix{Float64}(iris[:, 1:4])
y_str = string.(iris.Species)
label_map = Dict(c => i for (i, c) in enumerate(sort(unique(y_str))))
y = [label_map[s] for s in y_str]

# stratified 70/30 split (manual)
idx_tr = Int[]; idx_te = Int[]
for c in unique(y)
    idx_c = findall(==(c), y)
    shuffle!(idx_c)
    n_tr = round(Int, 0.7 * length(idx_c))
    append!(idx_tr, idx_c[1:n_tr])
    append!(idx_te, idx_c[n_tr+1:end])
end
X_tr, y_tr = X[idx_tr, :], y[idx_tr]
X_te, y_te = X[idx_te, :], y[idx_te]

mu = mean(X_tr, dims = 1); sd = std(X_tr, dims = 1)
X_tr = (X_tr .- mu) ./ sd
X_te = (X_te .- mu) ./ sd

println("train N=", size(X_tr, 1), "  features=", size(X_tr, 2),
        "  classes=", sort(unique(y_tr)))
train N=105 features=4 classes=[1, 2, 3]

3. First fit and prediction

Fit the Crammer–Singer model with the RBF kernel, then predict on the test set.

model = solve_crammer_singer(X_tr, y_tr;
                              C = 100.0, gamma = 0.1, upsilon = 0.2,
                              kernel = "rbf")
preds = predict_cs(model, X_te)

println("test accuracy: ", round(mean(preds .== y_te); digits = 4))
println("first 5 predictions: ", preds[1:5])
test accuracy: 0.9778 first 5 predictions: [2, 1, 3, 2, 2]

4. Compare kernels

Pass kernel = to pick one of "rbf", "linear", "poly", "sigmoid", or "laplacian". The non-Mercer sigmoid kernel is projected onto the PSD cone automatically so Clarabel accepts it.

for k in ["rbf", "linear", "poly", "sigmoid", "laplacian"]
    m = solve_crammer_singer(X_tr, y_tr;
                              C = 100.0, gamma = 0.1, upsilon = 0.2,
                              kernel = k, degree = 3, coef0 = 1.0)
    acc = mean(predict_cs(m, X_te) .== y_te)
    println(rpad("  $k", 14), "  test_acc = ", round(acc; digits = 3))
end
rbf test_acc = 0.978 linear test_acc = 0.956 poly test_acc = 0.933 sigmoid test_acc = 0.933 laplacian test_acc = 0.978

5. Compare decomposition strategies

WRSVM.jl implements the two native-multiclass strategies (solve_crammer_singer and solve_simmsvm). The SimMSVM solver has an N-variable dual instead of N × K, which translates to a large speedup at moderate K.

for (name, solve_fn, predict_fn) in [
    ("cs",      solve_crammer_singer, predict_cs),
    ("simmsvm", solve_simmsvm,        predict_simmsvm),
]
    t0 = time_ns()
    m = solve_fn(X_tr, y_tr; C = 100.0, gamma = 0.1, upsilon = 0.2,
                   kernel = "rbf")
    dt = (time_ns() - t0) / 1e6
    acc = mean(predict_fn(m, X_te) .== y_te)
    println(rpad("  $name", 12),
            "  fit_time = ", lpad(round(dt; digits = 1), 6), " ms   ",
            "test_acc = ", round(acc; digits = 3))
end
cs fit_time = 84.2 ms test_acc = 0.978 simmsvm fit_time = 9.1 ms test_acc = 0.933

6. Cross-validation loop

A manual 5-fold grid search over C and gamma. Julia’s low-overhead loops make this cheap even without a dedicated ML framework.

Cs     = [10.0, 100.0, 1000.0]
gammas = [0.01, 0.1, 1.0]
folds  = mod.(shuffle(1:size(X_tr, 1)), 5) .+ 1

best_acc, best_params = 0.0, (0.0, 0.0)
for C in Cs, g in gammas
    accs = Float64[]
    for k in 1:5
        tr = folds .!= k; va = folds .== k
        m = solve_crammer_singer(X_tr[tr, :], y_tr[tr];
                                  C = C, gamma = g, upsilon = 0.2,
                                  kernel = "rbf")
        push!(accs, mean(predict_cs(m, X_tr[va, :]) .== y_tr[va]))
    end
    a = mean(accs)
    if a > best_acc
        best_acc, best_params = a, (C, g)
    end
end

println("best: C=", best_params[1], "  gamma=", best_params[2],
        "  mean_cv_acc=", round(best_acc; digits = 3))
best: C=100.0 gamma=0.1 mean_cv_acc=0.962

7. Imbalanced data and label noise

Flip a fraction of minority labels and watch upsilon absorb the corruption.

y_noisy = inject_outliers_minority(X_tr, y_tr; outlier_rate = 0.2, seed = 0)

for ups in [0.0, 0.1, 0.5, 2.0]
    m = solve_simmsvm(X_tr, y_noisy;
                       C = 100.0, gamma = 0.1, upsilon = ups,
                       kernel = "rbf")
    acc = mean(predict_simmsvm(m, X_te) .== y_te)
    println("  upsilon=", ups, "  clean_test_acc = ", round(acc; digits = 3))
end
upsilon=0.0 clean_test_acc = 0.822 upsilon=0.1 clean_test_acc = 0.889 upsilon=0.5 clean_test_acc = 0.933 upsilon=2.0 clean_test_acc = 0.956