Improved Naïve Bayes – 12 Tips to obtain the most from the naïve bayes algorithm
Naïve Bayes is a simplistic and potent strategy that you ought to be evaluating and leveraging on your classification problems.
It is easy to comprehend, provides good outcomes and is quick to develop a model and make predictions. For these purposes alone, you ought to take a deeper look at the algorithm.
In a blog post, you learned how to implement the Naïve Bayes algorithm from the ground up in python.
In this blog article, you will learn tips and tricks to obtain the most from the Naïve Bayes algorithm.
1] Missing Data
Naïve Bayes can manage missing data.
Attributes are managed independently by the algorithm at both model construction time and forecast time.
As such, if a data instance has an absent value for an attribute, it can be glossed over while prepping the model, and ignored when a probability is calculated for a class value.
2] Leverage log probabilities
Probabilities are typically small numbers. To calculate joint probabilities, you require to multiply probabilities together. When you multiply one small number by another one, you obtain a very small number.
It is possible to face some complications with the accuracy of your floating point values, like under-runs. To prevent this problem, work in the log probability space (take logarithm of your probabilities)
This functions as to make a forecast in Naïve Bayes we require to know which class has the bigger probability (rank) instead of what the particular probability was.
3] Leverage other distributions
To leverage Naïve Bayes with categorical attributes, you calculate a frequency for every observation.
To leverage Naïve Bayes with real-valued attributes, you can summarize the density of the attribute leveraging a Gaussian distribution. Alternatively you can leverage another functional form that better details the distribution of the data, like an exponential.
Don’t restrict yourself to the distributions leveraged in instances of the Naïve Bayes algorithm. Select distributions that ideally characterize your data and prediction problem.
4] Leverage probabilities for Feature Selection
Feature selection is the choosing of those data attributes that ideally characterize a forecasted variable.
In Naïve Bayes, the odds for every attribute are calculated independently from the training dataset. You can leverage a search algorithm to explore the combo of the odds of differing attributes together and assess their performance at forecasting the output variable.
5] Segment the data
Is there a well-defined subset of your data that responds well to the Naïve Bayes probabilistic strategy?
Detecting and separating out segments that are easily managed by a simple probabilistic strategy like Naïve Bayes can provide you increased performance and concentrate on the elements of the problem that are more tough to model.
Exploring differing subsets, like the average or popular cases that are very probably managed well by Naïve Bayes.
6] Re-compute probabilities
Calculate the probabilities for every attribute is very quick.
This advantage of Naïve Bayes implies that you can re-calculate the odds as the data alters. This might be monthly, daily, or hourly.
This is something that might be unthinkable for other algorithms, but ought to be evaluated when leveraging Naïve Bayes if there is some temporal drift in the problem being modded.
7] Leverage as a Generative Model
The Naïve Bayes strategy characterizes the problem, which in turn can be leveraged for making forecasts about unobserved data.
This probabilistic characterization can also be leveraged to produce instances of the problem.
In the scenario of a numeric vector, the probability distributions can be sampled to develop new fictitious vectors.
In the scenario of text (a very widespread application of Naïve Bayes), the model can be leveraged to develop fictitious input documents.
How might this be useful in your problem?
At the very least, you can leverage the generative strategy to assist in furnishing context for what the model has characterized.
8] Remove Redundant Features
The performance of Naïve Bayes can degrade if the data contains highly correlated features.
This is due to the fact the highly correlated features are voted for twice in the model, over inflating their criticality.
Assess the correlation of attributes pairwise with one another leveraging a correlation matrix and eradicate those features that are the most highly correlated.
Nonetheless, always evaluate your problem prior and after such a change and stick with the form of the problem that leads to the improved outcomes.
9] Parallelize Probability Calculation
The probabilities for every attribute are calculated independently. This is the independence assumption in the strategy and the rationale behind why it has its name “naïve”.
You can exploit this assumption to further quicken up the execution of the algorithm by calculating attribute probabilities in parallel.
Dependent on the size of the dataset and your resources, you could perform this by leveraging differing CPUs, differing machines or differing clusters.
10] Less Data Than You Think
Naïve Bayes does not require a ton of data to feature good performance.
It requires enough data to comprehend the probabilistic relationship of every attribute in isolation with the output variable.
Provided that interactions amongst attributes are ignored in the model, we do not require instances of these interactions and thus generally lesser data than other algorithms, like logistic regression.
Further, it is less probable to overfit the training data with a lesser sample size.
Attempt Naïve Bayes if you do not possess much training data.
11] Zero Observations Problem
Naïve Bayes will not be reliant if there are considerable differences in the attribute distributions contrasted to the training dataset.
A critical instance of this is the scenario where a categorical attribute has a value that was not witnessed during training. In this scenario, the model will allocate a nil probability and be unable to make a forecast.
These scenarios should be checked for and managed differently. After such scenarios have been resolved (a solution is known), the odds should be recalculated and the model updated.
12] It functions anyway
A fascinating point about Naïve Bayes is that even when the independence assumption is violated and there are overtly known relationships amongst attributes, it functions anyway. Critically, this is one of the reasons why you require to spot check an array of algorithms on a provided problem, as the outcomes can very probably surprise you.
In this blog article, you came to know a lot about how to leverage and obtain more out of the Naïve Bayes algorithm.