Contents

  1. Install
  2. Load a dataset
  3. First fit and prediction
  4. Compare kernels
  5. Compare decomposition strategies
  6. Cross-validation with GridSearchCV
  7. Imbalanced data and label noise

1. Install

Install the package from the cloned repository. A virtual environment is optional but recommended.

git clone https://github.com/annicenajafi/WRSVM.git
cd WRSVM/wrsvm_package

python -m venv .venv
source .venv/Scripts/activate   # Windows (bash); use .venv/bin/activate on Unix

python -m pip install --upgrade pip
python -m pip install -e .

2. Load a dataset

We use the UCI Iris dataset that ships with scikit-learn. Features are standardized so that the Gaussian kernel bandwidth gamma behaves predictably across dimensions.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3,
                                          stratify=y, random_state=0)

scaler = StandardScaler().fit(X_tr)
X_tr = scaler.transform(X_tr)
X_te = scaler.transform(X_te)

print(f"train: N={X_tr.shape[0]}  features={X_tr.shape[1]}  classes={np.unique(y_tr)}")
train: N=105 features=4 classes=[0 1 2]

3. First fit and prediction

Fit the default Crammer–Singer strategy with the Gaussian (RBF) kernel, then score on held-out data.

from wrsvm import WRSVMClassifier

clf = WRSVMClassifier(strategy="cs", kernel="rbf",
                      C=100.0, gamma=0.1, upsilon=0.2)
clf.fit(X_tr, y_tr)

print("test accuracy:", clf.score(X_te, y_te))
print("predictions on 5 samples:", clf.predict(X_te[:5]))
test accuracy: 0.9777777777777777 predictions on 5 samples: [1 0 2 1 1]

4. Compare kernels

The package exposes five kernels through the kernel argument. The polynomial and sigmoid kernels accept a degree and coef0 in addition to gamma.

for k in ["rbf", "linear", "poly", "sigmoid", "laplacian"]:
    clf = WRSVMClassifier(kernel=k, C=100.0, gamma=0.1, upsilon=0.2,
                          degree=3, coef0=1.0)
    clf.fit(X_tr, y_tr)
    print(f"  {k:>10}  test_acc = {clf.score(X_te, y_te):.3f}")
rbf test_acc = 0.978 linear test_acc = 0.956 poly test_acc = 0.933 sigmoid test_acc = 0.933 laplacian test_acc = 0.978

On this small balanced problem most kernels perform similarly. On harder problems the choice matters; the laplacian kernel is often a strong baseline when features are not scale-matched.

5. Compare decomposition strategies

WRSVM supports four multiclass strategies: native Crammer–Singer (cs), the fast simplex-coded formulation (simmsvm), and the binary reductions one-vs-one (ovo) and one-vs-rest (ovr).

import time

for s in ["cs", "simmsvm", "ovo", "ovr"]:
    clf = WRSVMClassifier(strategy=s, kernel="rbf",
                          C=100.0, gamma=0.1, upsilon=0.2)
    t0 = time.perf_counter()
    clf.fit(X_tr, y_tr)
    dt = time.perf_counter() - t0
    print(f"  {s:>8}  fit_time = {dt*1000:6.1f} ms   test_acc = {clf.score(X_te, y_te):.3f}")
cs fit_time = 110.4 ms test_acc = 0.978 simmsvm fit_time = 12.8 ms test_acc = 0.933 ovo fit_time = 52.7 ms test_acc = 0.956 ovr fit_time = 154.9 ms test_acc = 0.956

Note the simmsvm speedup: it solves a single QP with N dual variables rather than N × K.

6. Cross-validation with GridSearchCV

WRSVMClassifier implements the scikit-learn estimator API, so it plugs directly into GridSearchCV and Pipeline.

from sklearn.model_selection import GridSearchCV

param_grid = {
    "kernel":  ["rbf", "laplacian"],
    "C":       [10.0, 100.0, 1000.0],
    "gamma":   [0.01, 0.1, 1.0],
    "upsilon": [0.1, 0.2, 0.5],
}

gs = GridSearchCV(
    WRSVMClassifier(strategy="cs"),
    param_grid, cv=5, n_jobs=-1,
)
gs.fit(X_tr, y_tr)

print("best params:", gs.best_params_)
print("best CV acc:", gs.best_score_)
print("test acc   :", gs.score(X_te, y_te))
best params: {'C': 10.0, 'gamma': 0.1, 'kernel': 'rbf', 'upsilon': 0.1} best CV acc: 0.9619 test acc : 1.0000

7. Imbalanced data and label noise

The upsilon parameter controls the per-class slack budget that absorbs mislabeled samples and lets the classifier tolerate imbalance without collapsing the minority class. To stress-test it, corrupt 20% of the minority labels at random.

from wrsvm import inject_outliers_minority

y_noisy = inject_outliers_minority(X_tr, y_tr, outlier_rate=0.2, seed=0)

for ups in [0.0, 0.1, 0.5, 2.0]:
    clf = WRSVMClassifier(strategy="simmsvm", kernel="rbf",
                          C=100.0, gamma=0.1, upsilon=ups)
    clf.fit(X_tr, y_noisy)
    print(f"  upsilon={ups:.1f}  clean_test_acc = {clf.score(X_te, y_te):.3f}")
upsilon=0.0 clean_test_acc = 0.822 upsilon=0.1 clean_test_acc = 0.889 upsilon=0.5 clean_test_acc = 0.933 upsilon=2.0 clean_test_acc = 0.956

Accuracy on the clean test set improves as upsilon grows, because the relaxation absorbs the flipped labels rather than fitting them.