Classification Modeling with Naive Bayes

Naive Bayes is a classification modeling method, like Logistic Regression and Decision Tree models.

Important: The database version of this operator has been deprecated and was removed in version 6.1. However, a new and improved Naive Bayes operator was created to replace it. To use this new operator, you must remove the old Naive Bayes (database) operator from your workflow and replace it with the new Naive Bayes (DB) operator. Support for Naive Bayes on Hadoop remains unchanged; see Naive Bayes (HD) for information.

A Naive Bayes model predicts a categorical or binary outcome, such as "yes" or "no." Specifically, the Team Studio Naive Bayes operator calculates the probability of a particular event occurring. It can be used to predict the probability of a certain data point being in a particular classification.

For an example use case, see Naive Bayes Use Case.

The Naive Bayes theorem assumes that the predictors or variables are all independently related to the outcome. For example, it can reflect the probability of a customer buying a computer based on the age of the customer (independently from the income, sex, or other attributes of the customer).

  • The Naive Bayes is a surprisingly accurate classifier given that the assumption of independence is rarely true.
  • This assumption of independence gives the Naive Bayes classifier the extra benefit of being computationally lightweight, requiring only small training sets for the calculation of means and variances rather than a more complex covariance matrix.
  • Naive Bayes classification methodology is also particularly helpful over other classification methods when faced with the 'curse of dimensionality'; that is, when the number of predictors, or independent variables, is very high.
  • Some typical examples of Naive Bayes modeling include spam detection, biological classifications, and financial loan forecasting.

In summary, the Team Studio Naive Bayes Operator implements a fast, effective classification tool with results that are easy to interpret.

Alternative Models

Related reference