Download PDFOpen PDF in browserA Hybrid Machine Learning Model with Cost- Function Based Outlier Removal and Its Application on Credit RatingEasyChair Preprint 176013 pages•Date: October 24, 2019AbstractWith the rapid growth in the credit industry, the ability to allocate capital efficiently and profitably is of great significance to financial institutions. Banks and credit companies often have sizeable loan portfolios, making it necessary to develop accurate credit scoring models. Slight improvement in credit scoring accuracy can reduce lenders’ risk and translate to significant future savings. Machine learning techniques such as support vector machines, neural networks, and logistic regression learning, are widely explored and utilized. In this paper, using Lending Club loaner information data as dataset and credit rating as subject, we explore a hybrid machine learning methodology, which combines different algorithms in different stages of data processing, training and prediction. In the data preprocessing stage, we introduced a cost based outlier removal technique which can generalized to all types machine learning algorithms. In our experiment, we implement logistic regression during feature treatment to reduce feature dimensions and the sample cost of a particular machine learning algorithm are calculated as the basis for outlier detection and removal. We create three models of support vector machine (SVM), decision tree (DT), and logistic regression (LR), and three hybrid models incorporating our new ideas into SVM, DT, and LR. The traditional and hybrid models are compared by efficiency, F1 score, accuracy, recall, AUC, and precision. The results demonstrate performance improvement of the Hybrid models. Keyphrases: Hybrid, credit score, learning, machine, outlier removal
|