Support Vector Machines

Our task is to predict whether a test sample belongs to one of two classes. We receive training examples of the form:{xi,yi},i= 1,...,n_andx_i∈Rd, yi∈ {−1,+1}. We call{xi}the co-variates or input vectors and{y__i}the response variables or labels.

We consider a very simple example where the data are in fact linearly separable: i.e. I can draw a straight linef(x) =wTxb_such that all cases with_y__i= −1fall on one side and havef(xi)<_0and cases with_y__i= +1fall on the other and havef(xi)_>_0. Given that we have achieved that, we could classify new test cases according to the rule_y_test=sign(xtest).

However, typically there are infinitely many such hyper-planes obtained by small perturbations of a given solution. How do we choose between all these hyper-planes which the solve the separation problem for our training data, but may have different performance on the newly arriving test cases. For instance, we could choose to put the line very close to members of one particular class, sayy= −1.Intuitively, when test cases arrive we will not make many mistakes on cases that should be classified withy= +1, but we will make very easily mistakes on the cases withy= −1(for instance, imagine that a new batch of test cases arrives which are smallperturbations of the training data). A sensible thing thus seems to choose the separation line as far away from bothy= −1andy= +1training cases as we can, i.e. right in the middle.

Geometrically, the vectorwis directed orthogonalto the line defined bywTx=

b. This can be understood as follows. First takeb= 0. Now it is clear that all vectors,x, with vanishing inner product withwsatisfy this equation, i.e. all vectors orthogonal towsatisfy this equation. Now translate the hyperplane away from the origin over a vectora. The equation for the plane now becomes:(xa)Tw= 0,

results matching ""

    No results matching ""