PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, AND MODEL EMSEMBLING ALGORITHMS
PREDICTING CLASS-IMBALANCED BUSINESS RISK
USING RESAMPLING, REGULARIZATION, AND
MODEL EMSEMBLING ALGORITHMS
Yan Wang1
, Xuelei Sherry Ni2
1Graduate College, Kennesaw State University, Kennesaw, USA
2Department of Statistics and Analytical Sciences, Kennesaw State University,
Kennesaw, USA
ABSTRACT
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
KEYWORDS
Imbalance, resampling, regularization, ensemble, risk modeling
ORIGINAL SOURCE URL : http://aircconline.com/ijmit/V11N1/11119ijmit01.pdf
Comments
Post a Comment