GIVE ME SOME CREDIT

team GOAL MODELS TOOLS

Wenjie Xu Predicting the loan default QDA & LDA R Studio

Michel Zou Logistic Regression

Stephanie Wang KNN

Keng-chu Lin Decision Tree & Random Forest

Luou Meng ROC Curve

#to build a model that predicts the probability of default to help banks determine whether or not a loan should be granted

Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two year.

The goal of this project is to build a model that borrowers can use to help make the best financial decisions .

This is a Kaggle competition: kaggle.com/c/GiveMeSomeCredit

#ANALYSIS Methodology

#data set analysis

Total Observations: 150,000

Training Data: 60,000

Testing Data: 90,000

Data Source: Kaggle.com

% of defaulter instances: 6.68%

#Dependent Variable

1. Serious Delinquent in 2 Years

#INDEPENDENT VARIABLES

Debt Ratio
Monthly Income
Revolving Utilization
Borrower’s Age
Number of Dependents
Number of Open Credit Loans
Number of Real Estate Loans
Number of Time 30-59 Days Past Due Not Worse in the Past 2 Years
Number of Time 60-89 Days Past Due Not Worse in the Past 2 Years
Number of Times 90 Days Late

#STATISTICAL MODELS & DATA MINING

1. KNN

2. Logistic regression

3. decision tree

4. random forest

5. MODELS COMPARISOn

6. ROC CURVE

based on the roc curve, random forest is better

# CONCLUSION

By misclassification rate, Random Forest is better.
By number of true positives, Decision Tree is better.
By ROC, Random Forest is better.

To determine the best model in reality, there are many things to consider:

❏ business goal of the bank
❏ true positive (successfully rejecting clients that will possibly default)
❏ false positive (rejecting quality clients, resulting in loss)
❏ true negative (obtaining quality clients)
❏ false negative (obtaining clients that will possibly default, resulting in loss)

GIVE ME SOME CREDIT

team GOAL MODELS TOOLS

#to build a model that predicts the probability of default to help banks determine whether or not a loan should be granted

#ANALYSIS Methodology

#data set analysis

Total Observations: 150,000

Training Data: 60,000

Testing Data: 90,000

Data Source: Kaggle.com

% of defaulter instances: 6.68%

#Dependent Variable

1. Serious Delinquent in 2 Years

#INDEPENDENT VARIABLES

Debt Ratio

Monthly Income

Revolving Utilization

Borrower’s Age

Number of Dependents

Number of Open Credit Loans

Number of Real Estate Loans

Number of Time 30-59 Days Past Due Not Worse in the Past 2 Years

Number of Time 60-89 Days Past Due Not Worse in the Past 2 Years

Number of Times 90 Days Late

#STATISTICAL MODELS & DATA MINING

1. KNN

2. Logistic regression

3. decision tree

4. random forest

5. MODELS COMPARISOn

6. ROC CURVE

based on the roc curve, random forest is better

# CONCLUSION

By misclassification rate, Random Forest is better.

By number of true positives, Decision Tree is better.

By ROC, Random Forest is better.

To determine the best model in reality, there are many things to consider: