Kernel Principal Components Analysis
Let’s fist see what PCA is when we do not worry about kernels and feature spaces. always be achieved by a simple translation of the axis.PWe will always assume that we have centered data, i.e.ixi= 0. This can
Our aim is to find meaningful projections of the data. However, we are facing an unsupervised problem where we don’t have access to any labels. If we had, we should be doing Linear Discriminant Analysis. Due to this lack of labels, our aim will be to findthe subspace of largest variance, where we choose the number of retained dimensions beforehand. This is clearly a strong assumption, because it may happen that there is interesting signal in the directions of small variance, in which case PCA in not a suitable technique (and we should perhaps use a technique called independent component analysis). However, usually it is true that the directions of smallest variance represent uninteresting noise.
To make progress, we start by writing down the sample-covariance matrixC,
(12.1)
The eigenvalues of this matrix represent the variance in the eigen-directions of data-space. The eigen-vector corresponding to the largest eigenvalue is the direction in which the data is most stretched out. The second direction is orthogonal to it and picks the direction of largest variance in that orthogonal subspace etc.
Thus, to reduce the dimensionality of the data, we project the data onto the re-