The Perceptron

We will now describe one the simplest parametric classifiers: the_perceptron_and its cousin the_logistic regression_classifier. However, despite its simplicity it should not be under-estimated! It is the workhorse for most companies involved with some formof machine learning (perhaps tying with the_decision tree_classifier). One could say that it represents the canonical parametric approach to classification where we believe that a straight line is sufficient to separate the two classes of interest. An example of this is given in Figure??where the assumption that the two classes can be separated by a line is clearly valid.

However, this assumption need not always be true. Looking at Figure??we clearly observe that there is no straight line that will dothe job for us. What can we do? Our first inclination is probably to try and fit a more complicated separation boundary. However, there is another trick that we ill be using often in this book. Instead we can increase the dimensionality of the space by “measuring” more things of the data. Callφ__k(X)featurek_that was measured from the data. The features can be highly nonlinear functions. The simplest choice may be to also measureφi(_X) =_Xi2,∀_k_for each attribute_X__k. But we may also measure cross-productssuch asφ__ij(X) =XiXj,i,j. The latter will allow you to explicitly model correlations between attributes. For example, if_Xi_represents the presence (1) or absence (0) of the word “viagra” and similarly for_Xj_and the presence/absence of the word “dysfunction”, then the cross product feature_XiX__j_let’s you model the presence of both words simultaneously (which should be helpful in trying to find out what this document is about). We can add as many features as welike, adding another dimension for every new feature. In this higher dimensional space we can now be more confident in assuming that the data can be separated by a line.

results matching ""

    No results matching ""