Trends in Financial Risk Management Systems in 2020 - International Journal of Managing Information Technology (IJMIT)
Trends In Financial
Risk Management Systems In 2020
International Journal Of Managing
Information Technology (IJMIT)
ISSN: 0975-5586 (Online); 0975 - 5926 (Print)
Predicting
Class-Imbalanced Business Risk Using Resampling, Regularization, and Model
Emsembling Algorithms
Yan Wang, Xuelei Sherry Ni, Kennesaw
State University, USA
Abstract
We aim at
developing and improving the imbalanced business risk modeling via jointly
using proper evaluation criteria, resampling, cross-validation, classifier
regularization, and ensembling techniques. Area Under the Receiver Operating
Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold
cross validation. Two undersampling strategies including random undersampling
(RUS) and cluster centroid undersampling (CCUS), as well as two oversampling
methods including random oversampling (ROS) and Synthetic Minority Oversampling
Technique (SMOTE), are applied. Three highly interpretable classifiers,
including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques,
including Bagging and Boosting, are applied on the DT classifier for further
model improvement. The results show that, Boosting on DT by using the
oversampled data containing 50% positives via SMOTE is the optimal model and it
can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907,
respectively.
Keywords
Imbalance,
resampling, regularization, ensemble, risk modeling
References
[1] R. Calabrese, G. Marra, And S. Angela Osmetti,
“Bankruptcy Prediction Of Small And
Medium Enterprises Using A Flexible Binary Generalized Extreme Value Model,”
Journal Of The Operational Research Society, Vol. 67, No. 4, Pp. 604–615, 2016.
[2] J. Galindo And P. Tamayo, “Credit
Risk Assessment Using Statistical And Machine Learning: Basic Methodology And
Risk Modeling Applications,” Computational Economics, Vol. 15, No. 1-2, Pp.
107–143, 2000.
[3] N. Scott G. Sunil, And K. Wagner. “Defection
Detection: Measuring And Understanding The Predictive Accuracy Of Customer
Churn Models,” Journal Of Marketing Research, Vol.43, No. 2, Pp. 204-211,
2006.
[4] Y. Wang, X. S. Ni, And B. Stone, “A Two-Stage Hybrid Model By Using
Artificial Neural Networks As Feature Construction Algorithms,” Internal
Journal Of Data Mining & Knowledge Management Process (Ijdkp), Vol. 8, No.
6, 2018.
[5] B. Baesens, T. Van Gestel, S. Viaene, M.
Stepanova, J. Suykens, And J. Vanthienen, “Benchmarking
State-Of-The-Art Classification Algorithms For Credit Scoring,” Journal Of
The Operational Research Society, Vol. 54, No. 6, Pp. 627–635, 2003.
[6] M. Zekic-Susac, N. Sarlija, And M. Bensic, “Small
Business Credit Scoring: A Comparison Of Logistic Regression, Neural Network,
And Decision Tree Models,” Information Technology Interfaces, 2004. 26th
International Conference On Ieee, 2004, Pp. 265–270.
[7] N. V. Chawla, N. Japkowicz, And A. Kotcz,
“Special Issue On Learning From Imbalanced Data Sets,” Acm Sigkdd Explorations
Newsletter, Vol. 6, No. 1, Pp. 1–6, 2004.
[8] Y. Zhou, M. Han, L. Liu, J. S. He, And Y. Wang,
“Deep
Learning Approach For Cyberattack Detection,” Ieee Infocom 2018-Ieee
Conference On Computer Communications Workshops (Infocom Wkshps). Ieee, 2018,
Pp. 262–267.
[9] G. King And L. Zeng, “Logistic Regression In
Rare Events Data,” Political Analysis, Vol. 9, No. 2, Pp. 137–163, 2001.
[10] N. Japkowicz And S. Stephen, “The Class
Imbalance Problem: A Systematic Study,” Intelligent
Data Analysis, Vol. 6, No. 5, Pp. 429–449, 2002.
[11] M. Schubach, M. Re, P. N. Robinson, And G.
Valentini, “Imbalance-Aware Machine Learning For Predicting Rare And Common
Disease-Associated Non-Coding Variants,” Scientific Reports, Vol. 7, No. 1, P.
2959, 2017.
[12] V. Lopez, A. Fernandez, S. Garcia, V. Palade,
And F. Herrera, “An
Insight Into Classification With Imbalanced Data: Empirical Results And Current
Trends On Using Data Intrinsic Characteristics,” Information Sciences, Vol.
250, Pp. 113–141, 2013.
[13] M. Galar, A. Fernandez, E. Barrenechea, H.
Bustince, And F. Herrera, “A Review On
Ensembles For The Class Imbalance Problem: Bagging-, Boosting-, And Hybrid-
Based Approaches,” Ieee Transactions On Systems, Man, And Cybernetics, Part
C (Applications And Reviews), Vol. 42, No. 4, Pp. 463–484, 2012.
[14] Y. Xia, C. Liu, Y. Li, And N. Liu, “A Boosted
Decision Tree Approach Using Bayesian Hyperparameter Optimization For Credit
Scoring,” Expert Systems With Applications, Vol. 78, Pp. 225– 241, 2017.
[15] G. Wang And J. Ma, “A
Hybrid Ensemble Approach For Enterprise Credit Risk Assessment Based On Support
Vector Machine,” Expert Systems With Applications, Vol. 39, No. 5, Pp. 5325–5331,
2012.
[16] J. Burez And D. Van Den Poel, “Handling Class
Imbalance In Customer Churn Prediction,” Expert Systems With Applications, Vol.
36, No. 3, Pp. 4626–4636, 2009.
[17] D. Muchlinski, D. Siroky, J. He, And M.
Kocher, “Comparing Random Forest With Logistic Regression For Predicting
Class-Imbalanced Civil War Onset Data,” Political Analysis, Vol. 24, No. 1, Pp.
87–103, 2016.
[18] B. W. Yap, K. A. Rani, H. A. A. Rahman, S.
Fong, Z. Khairudin, And N. N. Abdullah, “An
Application Of Oversampling, Undersampling, Bagging And Boosting In Handling
Imbalanced Datasets,” Proceedings Of The First International Conference On Advanced
Data And Information Engineering (Daeng-2013). Springer, 2014, Pp. 13–22.
[19] N. Japkowicz Et Al., “Learning
From Imbalanced Data Sets: A Comparison Of Various Strategies,” Aaai
Workshop On Learning From Imbalanced Data Sets, Vol. 68. Menlo Park, Ca, 2000,
Pp. 10– 15.
[20] C. Goutte And E. Gaussier, “A
Probabilistic Interpretation Of Precision, Recall And F Score, With Implication
For Evaluation,” European Conference On Information Retrieval. Springer,
2005, Pp. 345–359.
[21] M. Pavlou, G. Ambler, S. R. Seaman, O.
Guttmann, P. Elliott, M. King, And R. Z. Omar, “How To Develop A More Accurate
Risk Prediction Model When There Are Few Events,” Bmj, Vol. 351, P. H3868,
2015.
[22] Y. Zhang And P. Trubey, “Machine Learning And
Sampling Scheme: An Empirical Study Of Money Laundering Detection,” 2018.
[23] N. V. Chawla, “Data Mining For Imbalanced
Datasets: An Overview,” Data Mining And Knowledge Discovery Handbook. Springer,
2009, Pp. 875–886.
[24] S.-J. Yen And Y.-S. Lee, “Cluster-Based
Under-Sampling Approaches For Imbalanced Data Distributions,” Expert Systems
With Applications, Vol. 36, No. 3, Pp. 5718–5727, 2009.
[25] A. Liu, J. Ghosh, And C. E. Martin,
“Generative Oversampling For Mining Imbalanced Datasets.” Dmin, 2007, Pp.
66–72.
[26] N. V. Chawla, K. W. Bowyer, L. O. Hall, And W.
P. Kegelmeyer, “Smote: Synthetic Minority Oversampling Technique,” Journal Of
Artificial Intelligence Research, Vol. 16, Pp. 321–357, 2002.
[27] L. Zhang, J. Priestley, And X. Ni, “Influence Of The Event Rate On
Discrimination Abilities Of Bankruptcy Prediction Models,” Internal Journal
Of Database Management Systems (Ijdms), Vol. 10, No. 1, 2018.
[28] L. Lusa Et Al., “Joint Use Of Over-And
Under-Sampling Techniques And Cross- Validation For The Development And
Assessment Of Prediction Models,” Bmc Bioinformatics, Vol. 16, No. 1, P. 363, 2015.
[29] M. Maalouf And T. B. Trafalis, “Robust Weighted
Kernel Logistic Regression In Imbalanced And Rare Events Data,” Computational
Statistics & Data Analysis, Vol. 55, No. 1, Pp. 168–183, 2011.
[30] M. Maalouf And M. Siddiqi, “Weighted Logistic
Regression For Large-Scale Imbalanced And Rare Events Data,” Knowledge-Based
Systems, Vol. 59, Pp. 142–148, 2014.
[31] P. Ravikumar, M. J. Wainwright, J. D. Lafferty
Et Al., “High-Dimensional
Ising Model Selection Using 1-Regularized Logistic Regression,” The Annals
Of Statistics, Vol. 38, No. 3, Pp. 1287–1319, 2010.
[32] H. Y. Chang, D. S. Nuyten, J. B. Sneddon, T.
Hastie, R. Tibshirani, T. Sørlie, H. Dai, Y. D. He, L. J. Van’t Veer, H.
Bartelink Et Al., “Robustness, Scalability, And Integration Of A Wound-Response
Gene Expression Signature In Predicting Breast Cancer Survival,” Proceedings Of
The National Academy of Sciences of the United States of America, vol. 102, no.
10, pp. 3738–3743, 2005.
[33] W. Liu, S. Chawla, D. A. Cieslak, and N. V.
Chawla, “A
robust decision tree algorithm for imbalanced data sets,” Proceedings of
the 2010 SIAM International Conference on Data Mining. SIAM, 2010, pp. 766–777.
[34] I. H. Witten, E. Frank, M. A. Hall, and C. J.
Pal, Data Mining: Practical machine learning tools and techniques. Morgan
Kaufmann, 2016.
[35] S. Bhattacharyya, S. Jha, K. Tharakunnel, and
J. C. Westland, “Data mining for credit card fraud: A comparative study,”
Decision Support Systems, vol. 50, no. 3, pp. 602–613, 2011.
[36] Y. Sun, M. S. Kamel, A. K. Wong, and Y. Wang,
“Cost-sensitive
boosting for classification of imbalanced data,” Pattern Recognition, vol.
40, no. 12, pp. 3358– 3378, 2007.
[37] E. Bauer and R. Kohavi, “An
empirical comparison of voting classification algorithms: Bagging, boosting,
and variants,” Machine learning, vol. 36, no. 1-2, pp. 105–139, 1999.
[38] Y. Wang and X. S. Ni, “A XGBoost risk model
via feature selection and Bayesian hyperparameter optimization,” https://arxiv.org/abs/1901.08433.
Authors
Yan Wang is a Ph.D.
candidate in Analytics and Data Science at Kennesaw State University. Her
research interest contains algorithms and applications of data mining and
machine learning techniques in financial areas. She has been a summer Data
Scientist intern at Ernst & Yo ung and focuses on the fraud detections
using machine learning techniques. Her current research is about exploring new algorithms/models
that integrates new machine learning tools into traditional statistical
methods, which aims at helping financial institutions make better strategies.
Yan received her M.S. in Statistics from university of Georgia.
Dr. Xuelei
Sherry Ni is currently a Professor of Statistics and Interim Chair of
Department of Statistics and Analytical Sciences at Kennesaw State University, where
she has been teaching since 2006. She served as the program director for the Master
of Science in Applied Statistics program from 2014 to 2018, when she focused on
providing students an applied leaning experience using real-world problems. Her
articles have appeared in the Annals of Statistics, the Journal of Statistical
Planning and Inference and Statistica Sinica, among others. She is the also the
author of several book chapters on modeling and forecasting. Dr. Ni received her
M.S. and Ph.D. in Applied Statistics from Georgia Institute of Technology
Comments
Post a Comment