Naive Bayes
Synopsis
This Operator generates a Naive Bayes classification model.
Description
Naive Bayes is a high-bias, low-variance classifier, and it can build a good model even with a small data set. It is simple to use and computationally inexpensive. Typical use cases involve text categorization, including spam detection, sentiment analysis, and recommender systems.
The fundamental assumption of Naive Bayes is that, given the value of the label (the class), the value of any Attribute is independent of the value of any other Attribute. Strictly speaking, this assumption is rarely true (it's "naive"!), but experience shows that the Naive Bayes classifier often works well. The independence assumption vastly simplifies the calculations needed to build the Naive Bayes probability model.
To complete the probability model, it is necessary to make some assumption about the conditional probability distributions for the individual Attributes, given the class. This Operator uses Gaussian probability densities to model the Attribute data.
Differentiation
Naive Bayes (Kernel)
The alternative Operator Naive Bayes (Kernel) is a variant of Naive Bayes where multiple Gaussians are combined, to create a kernel density.
Input
training set
The input port expects an ExampleSet.
Output
model
The Naive Bayes classification model is delivered from this output port. The model can now be applied to unlabelled data to generate predictions.
example set
The ExampleSet that was given as input is passed through without changes.
Parameters
Laplace correction
简单的天真英航yes includes a weakness: if within the training data a given Attribute value never occurs in the context of a given class, then the conditional probability is set to zero. When this zero value is multiplied together with other probabilities, those values are also set to zero, and the results will be misleading. Laplace correction is a simple trick to avoid this problem, adding one to each count to avoid the occurrence of zero values. For most training sets, adding one to each count has only a negligible effect on the estimated probabilities.