Overview | Data Quality Classif

NAIVE BAYES

OVERVIEW

Naive Bayes (NB) is a collection of classification algorithms based on Bayes’ Theorem. It's called 'Naive' because it assumes that the presence of a feature in a class is independent of any other feature. Despite its simplicity, Naive Bayes can outperform more sophisticated classification methods. They are among the simplest Bayesian network models but are coupled with a strong assumption that all features are independent of each other given the class label.

TYPES OF NAIVE BAYES

Multinomial Naive Bayes:

The Multinomial Naïve Bayes algorithm is particularly suited for classification tasks where features represent the frequencies with which certain events have been generated by a multinomial distribution (such as the frequencies of words in a document). It is widely used in text classification, including spam filtering, sentiment analysis, and topic categorization.

The training process involves calculating the prior probability of each class (the frequency of each class in the training set) and the likelihood of each feature given a class (the frequency of each feature in samples of a specific class). Predictions are made by applying Bayes' theorem to calculate the posterior probability of each class given an observation and choosing the class with the highest probability.

Smoothing: Multinomial NB includes a smoothing parameter to handle cases where a given word (or feature) has not been observed with a training class. Without smoothing, such cases would result in a probability of zero, which could nullify all other evidence and affect the model's ability to make predictions. Smoothing ensures that every category gets a chance, even if its frequency in the training set is zero.

Bernoulli Naïve Bayes:

Bernoulli Naïve Bayes is similar to Multinomial Naïve Bayes but is designed for binary/boolean features. It models the presence or absence of features using a Bernoulli distribution. This model is appropriate for tasks where features are binary (such as text classification with a 'bag of words' model where the vocabulary only records whether a word appears in a document, not how often).

Both these models are based on the fundamental concept of probability and assume that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable.