Kernel ridge Regression
Possibly the most elementary algorithm that can be kernelized is ridge regression. Here our task is to find a linear function that models the dependencies between covariates{x__i}and response variables{y__i}, both continuous. The classical way to do that is to minimize the quadratic cost,
(10.1)
However, if we are going to work in feature space, where we replacexi→ Φ(xi), there is an clear danger that we overfit. Hence we need to regularize. This is an important topic that will return in future classes.
A simple yet effective way to regularize is to penalize the norm ofw. This is sometimes called “weight-decay”. It remains to be determined how to chooseλ. The most used algorithm is to use cross validation or leave-one-out estimates. The total cost function hence becomes,
(10.2)
which needs to be minimized. Taking derivatives and equating them to zero gives,
We see that the regularization term helps to stabilize the inverse numerically by bounding the smallest eigenvalues away from zero.
51
CHAPTER10. KERNELRIDGEREGRESSION