问题描述:

Two questions about using libsvm in python:

- How can I know if the problem is feasible or not?
- How can I get the primal variable (w and the offset b)?

I use a simple example considering 4 training points (depicted by *) in a 2D space:

`*----*`

| |

| |

*----*

I train the SVM with the C_SVC formulation and a linear kernel, I classify the 4 points in two labels [-1, +1].

For example, when I set the training points like this, it should find a separating hyperplane.

`{-1}----{+1}`

| |

| |

{-1}----{+1}

But with this nonlinear problem, it should not been able to find a separating hyperplane (because of the linear kernel).

`{+1}----{-1}`

| |

| |

{-1}----{+1}

And I would like to be able to detect this case.

Sample code for the 2nd example:

`from svmutil import *`

import numpy as np

y = [1, -1, 1, -1]

x = [{1:-1, 2 :1}, {1:-1, 2:-1}, {1:1, 2:-1}, {1:1, 2:1}]

prob = svm_problem(y, x)

param = svm_parameter()

param.kernel_type = LINEAR

param.C = 10

m = svm_train(prob, param)

Sample output:

`optimization finished, #iter = 21`

nu = 1.000000

obj = -40.000000, rho = 0.000000

nSV = 4, nBSV = 4

Total nSV = 4

Run cross validation for a exponential grid of C as explained in the libsvm guide on a linear kernel SVM. If the **training set accuracy** can never get close to 100% that means that the linear model is **too biased** for the data which in turn means that the linear assumption is false (the data is not linearly separable).

BTW. the **testing set accuracy** is the real evaluation of the generalization ability of the model but it measures the sum of the **bias and variance** hence cannot be used directly to measure the bias only. The difference between the training and testing sets accuracies measures the variance or overfitting of the model. More information on error analysis can be found in this blog post summarizing practical tips and tricks from the ml-class online class.