6.5Remarks

One of the main limitationsof the NB classifier is that it assumes independence between attributes (This is presumably the reason why we call it the_naive_Bayesian classifier). This is reflected in the fact that each classifier has an independent vote inthe final score. However, imagine that I measure the words, “home” and “mortgage”. Observing “mortgage” certainly raises the probability of observing “home”. We say that they are positively correlated. It would therefore be more fair if we attributed a smaller weight to “home” if we already observed mortgage because they convey the same thing: this email is about mortgages for your home. One way to obtain a more fair voting scheme is to model these dependencies explicitly. However, this comes at a computational cost (a longer time before you receive your email in your inbox) which may not always be worth the additional accuracy. One should also note that more parameters do not necessarily improve accuracy because too many parameters may lead to overfitting.

results matching ""

    No results matching ""