influenceAUC is an R package that focuses on identifying influential observations from the perspective of model diagnostics in binary classification. Most of the related research is based on the assumption that positive instances tend to have higher values, then the sources of influential cases can categorize as follows: + negative cases with comparatively higher scores + positive cases with relatively lower scores
The proposed methods rely on the area under the receiver operating characteristic curve (AUC) and cumulative lift chart (CLC), which indirectly facilitate the methods suitable to any classifiers with continuous score outputs. The theoretical approaches evaluate the influences of observations to the overall AUC, and adjusted CLCs offer the existence and approximate locations of those influential cases through data visualization. Because each method may have its pros and cons, we suggest end-users to apply all of them together to reach reliable results. Please see the reference for more information.
These modified CLCs disclose influential observations without masking and imbalanced data effects but lack quantitative values for further comparison.
Ke, B. S., Chiang, A. J., & Chang, Y. C. I. (2018). Influence Analysis for the Area Under the Receiver Operating Characteristic Curve. Journal of biopharmaceutical statistics, 28(4), 722-734.
influenceAUC
in publications use: Ke B, Chang Y, Wang W (2020). _influenceAUC: Identify
Influential Observations in Binary Classification_. R package
version 0.1.2,
<https://CRAN.R-project.org/package=influenceAUC>.
@Manual{,
title = {influenceAUC: Identify Influential Observations in Binary Classification},
author = {Bo-Shiang Ke and Yuan-chin Ivan Chang and Wen-Ting Wang},
year = {2020},
note = {R package version 0.1.2},
url = {https://CRAN.R-project.org/package=influenceAUC},
}