1.9. Naive Bayes - Scikit-learn

1.9.3. Complement Naive Bayes#

ComplementNB implements the complement naive Bayes (CNB) algorithm. CNB is an adaptation of the standard multinomial naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets. Specifically, CNB uses statistics from the complement of each class to compute the model’s weights. The inventors of CNB show empirically that the parameter estimates for CNB are more stable than those for MNB. Further, CNB regularly outperforms MNB (often by a considerable margin) on text classification tasks.

Weights calculation#

The procedure for calculating the weights is as follows:

\[ \begin{align}\begin{aligned}\hat{\theta}_{ci} = \frac{\alpha_i + \sum_{j:y_j \neq c} d_{ij}} {\alpha + \sum_{j:y_j \neq c} \sum_{k} d_{kj}}\\w_{ci} = \log \hat{\theta}_{ci}\\w_{ci} = \frac{w_{ci}}{\sum_{j} |w_{cj}|}\end{aligned}\end{align} \]

where the summations are over all documents \(j\) not in class \(c\), \(d_{ij}\) is either the count or tf-idf value of term \(i\) in document \(j\), \(\alpha_i\) is a smoothing hyperparameter like that found in MNB, and \(\alpha = \sum_{i} \alpha_i\). The second normalization addresses the tendency for longer documents to dominate parameter estimates in MNB. The classification rule is:

\[\hat{c} = \arg\min_c \sum_{i} t_i w_{ci}\]

i.e., a document is assigned to the class that is the poorest complement match.

References#
  • Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623).

Từ khóa » Công Thức Naive Bayes