B.3The Gaussian Kernel
This is given by,
(B.8)
whereσ_controls the flexibility of the kernel: for very smallσthe Gram matrix becomes the identity and every points is very dissimilar to any other point. On the other hand, forσ_very large we find the constant kernel, with all entries equal to1, and hence all points looks completely similar. This underscores the need in kernel-methods for regularization; it is easy to perform perfect on the training data which does not imply you will do well on new test data.
In the RKHS construction the features corresponding to the Gaussian kernel are Gaussians around the data-case, i.e. smoothed versions of the data-cases,
(B.9)
and thus every new direction which is added to the feature space is going to be orthogonal to all directions outside the width of the Gaussian and somewhat aligned to close-by points.
Since the inner product of any feature vector with itself is1, all vectors have length1. Moreover, inner products between any two different feature vectors is positive, implying that all feature vectors can be represented in the positive orthant (or any other orthant), i.e. they lie on a sphere of radius1in a single orthant.
80APPENDIXB. KERNELDESIGN