59

CHAPTER12. KERNELPRINCIPALCOMPONENTSANALYSIS

tained eigen-directions of largest variance:

(12.2)

and the projection is given by,
yi=U__kTxi i (12.3)

whereU__k_means the_d×_k_sub-matrix containingthe first_k_eigenvectors as columns. As a side effect, we can now show that the projected data are de-correlated in this new basis:

whereΛk_is the (diagonal)_k×_k_sub-matrix corresponding to the largest eigenvalues.

Another convenient property of this procedure isthat the reconstruction error in_L_2-norm between fromytoxis minimal, i.e.

(12.5)

wherePk=UkUkT_is the projection onto the subspace spanned by the columns of_U__k, is minimal.

Now imagine that there are more dimensions than data-cases, i.e. some dimensions remain unoccupied by the data. In this case it is not hard to show that the eigen-vectors that span the projection space must lie in the subspace spanned by the data-cases. This can be seen as follows,

xi

_i_ii

whereua_is some arbitrary eigen-vector of_C. The last expression can be interpreted as: “every eigen-vector can be exactly written (i.e. losslessly) as some linear combination of the data-vectors, and hence it must lie in its span”. This also implies that instead of the eigenvalue equationsCu=λuwe may consider the_N_projected equationsx. From this equation the coefficients

12.1. CENTERINGDATAINFEATURESPACE61

αia_can be computed efficiently a space of dimension_N(and notd) as follows,

x

x

_k_jj

1_a_TTaT

N_Xαj[x_ixk][xkxj]=_λaXα_jxixj

_j,k_j

We now rename the matrix[xT__ixj] =_K__ij_to arrive at,

K_2α_a=aKαaKαa= (λ˜aa_withλ˜_a=Nλ__a(12.8)

So, we have derived an eigenvalue equation forαwhich in turn completely determines the eigenvectorsu. By requiring thatuis normalized we find,

u

i,j

(12.9) Finally, when we receive a new data-casetand we like to compute its projections onto the new reduced space, we compute,

u(12.10)_i_i

This equation should look familiar, it is central to most kernel methods.

Obviously, the whole exposition was setup so that in the end we only needed the matrixK_to do our calculations. This implies that we are now ready to kernelize the procedure by replacingx_i→ Φ(xi)and definingK__ij= Φ(xi)Φ(xj)T, whereΦ(xi) = Φia.

results matching ""

    No results matching ""