Credit Scoring in R 101 (2024)

[This article was first published on jkunst.com: Entries for category R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post we’ll fit some predicitve models in (well know) data bases, and evalute the performance of each model. Disclaimer1: for simplicity the predictive variables are treating without apply any transformation to get a better performance or stability, etc. We’ll use two datas to evaluate the performances of the models. Both data have categorical and continous variables and we’ll use 50-50 split to have a train and test data. The datas are:

  • German Credit: The German Credit data frame has 1000 rows and 8 columns. This are data for clients of a south german bank, 700 good payers and 300 bad payers. They are used to construct a credit scoring method. This data have 20 predictive variables and 1000 observations and have a bad rate of 30%. So, after the select subsets to fit the models the distributions in the data are:
## sample## response test train## bad 14.9% 15.1%## good 38.1% 31.9%
  • Bankloan Binning: This is a hypothetical data file containing financial and demographic information on past customers. Here is the source. This data have 8 predictive variables and 5000 observations and have a bad rate of 25.1%.
## sample## response test train## bad 12.7% 12.4%## good 37.3% 37.6%

The models to compare are logistic, conditional inference trees (party package), single-hidden-layer neural network (nnet pakage) and linear discriminant analysis. To evalute the performance there are some indicators like KS statistic, Area under ROC curve among others. If you are not familiar with this terms check this link. Now, let’s go with the results.

German Credit Data

## SCORE SAMPLE BR KS AUCROC Gain10 Gain20 Gain30 Gain40 Gain50## 2 SCORE_Logistic train 0.32 0.54 0.83 0.25 0.48 0.64 0.74 0.82## 4 SCORE_CTree train 0.32 0.40 0.76 0.53 0.53 0.61 0.87 0.87## 6 SCORE_SLNNET train 0.32 0.64 0.90 0.30 0.54 0.73 0.82 0.89## 8 SCORE_LDA train 0.32 0.54 0.83 0.24 0.47 0.63 0.76 0.82## 1 SCORE_Logistic test 0.28 0.46 0.78 0.23 0.43 0.58 0.72 0.82## 3 SCORE_CTree test 0.28 0.37 0.74 0.50 0.50 0.57 0.87 0.87## 5 SCORE_SLNNET test 0.28 0.42 0.77 0.21 0.40 0.58 0.68 0.79## 7 SCORE_LDA test 0.28 0.47 0.79 0.23 0.46 0.57 0.73 0.82daux <- subset(data1, SAMPLE == "test")daux_roc <- ldply(str_pattern(names(daux),"SCORE"), function(score){ perf <- performance(prediction(daux[[score]], daux$GB), "tpr","fpr") df <- data.frame(x = unlist(perf@"x.values") , y = unlist(perf@"y.values")) df$score <- score df})ggplot(daux_roc) + geom_line(aes(x,y, color = score), size = 1.2) + scale_color_manual('',values=brewer.pal(length(unique(daux_roc$score)), "RdBu")) + geom_path(data=data.frame(x = c(0,1), y = c(0,1)), aes(x,y), colour = "gray", size = 1) + scale_x_continuous("False Positive Rate (1 - Specificity)", labels = percent_format(), limits = c(0, 1)) + scale_y_continuous("True Positive Rate (Sensivity or Recall)", labels = percent_format(), limits = c(0, 1)) + theme(legend.position = "top") + ggtitle("ROC Curves for German Credit Data (validation)")

Credit Scoring in R 101 (1)

Now we can plot the distributions of good/bads in each model. We'll transform the data whith melt function and then plot faceting by score.

daux <- subset(data1, SAMPLE == "test", select = c("GB", "SCORE_Logistic", "SCORE_CTree","SCORE_SLNNET","SCORE_LDA"))daux <- melt(daux, id = "GB")ggplot(daux, aes(x=value, fill = factor(GB))) + geom_density(alpha = 0.6, size = .75) + facet_wrap(~variable, ncol=2) + scale_fill_manual(values = brewer.pal(3, "Dark2")) + theme(legend.position = "none", axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank(), plot.margin = unit(rep(0.5, 4), "lines"), title = element_text(size = 9))

Credit Scoring in R 101 (2)

Bankloan Binning data

## SCORE SAMPLE BR KS AUCROC Gain10 Gain20 Gain30 Gain40 Gain50## 2 SCORE_Logistic train 0.25 0.54 0.84 0.31 0.51 0.67 0.79 0.89## 4 SCORE_CTree train 0.25 0.52 0.84 0.34 0.50 0.82 0.82 0.90## 6 SCORE_SLNNET train 0.25 0.56 0.86 0.32 0.52 0.68 0.82 0.89## 8 SCORE_LDA train 0.25 0.52 0.84 0.31 0.51 0.66 0.78 0.88## 1 SCORE_Logistic test 0.25 0.53 0.84 0.31 0.51 0.68 0.78 0.87## 3 SCORE_CTree test 0.25 0.45 0.79 0.30 0.51 0.76 0.76 0.84## 5 SCORE_SLNNET test 0.25 0.51 0.83 0.30 0.51 0.66 0.77 0.85## 7 SCORE_LDA test 0.25 0.50 0.83 0.31 0.51 0.67 0.77 0.85

Credit Scoring in R 101 (3)

Credit Scoring in R 101 (4)

Do you want to comment about the results? If you are interesting in this topic reproduce this example. And if you have questions and/or improvements or want to know more details for the code please comment.

References

  1. Ggplot2
  2. RStudio
  3. Knitr
  4. Guide to credit scoring in R

Related

To leave a comment for the author, please follow the link and comment on their blog: jkunst.com: Entries for category R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Credit Scoring in R 101 (2024)

FAQs

What is the math of credit scoring? ›

FICO Scores are calculated using many different pieces of credit data in your credit report. This data is grouped into five categories: payment history (35%), amounts owed (30%), length of credit history (15%), new credit (10%) and credit mix (10%).

Why do I only have one credit score? ›

Your credit scores may vary according to the credit scoring model used, and may also vary based on which credit bureau furnishes the credit report used for the data. That's because not all lenders and creditors report to all three nationwide credit bureaus. Some may report to only two, one or none at all.

What is the credit scoring model for loans? ›

Credit scoring models

These models leverage statistical algorithms and historical credit data to evaluate the likelihood of a borrower defaulting on a loan or credit obligation. The primary objective is to provide lenders with a quantitative measure that helps them make informed decisions about extending credit.

Is there a formula for credit score? ›

Your credit score, which commonly refers to your FICO score, is calculated based on five factors: payment history, amount owed, length of credit history, new credit, and credit mix. Although FICO does not reveal its specific calculation, it does report the main factors used to calculate its credit scores.

What are the 4 R's of credit scoring? ›

As [1] summarised, credit scoring is functional in four scenarios denoted by the acronym 4R, namely Risk, Response, Revenue and Retention.

What is the most commonly used credit scoring model answer? ›

The most recognized credit score is the FICO score, which comes from the Fair Isaac Company. FICO has more than 50 different versions of your score that it sends to lenders. The score may change, depending on which company asks and what was important to that company in calculating your score.

What are the 5 C's of credit? ›

Called the five Cs of credit, they include capacity, capital, conditions, character, and collateral. There is no regulatory standard that requires the use of the five Cs of credit, but the majority of lenders review most of this information prior to allowing a borrower to take on debt.

What is the credit score algorithm? ›

It is calculated using an algorithm—a mathematical formula that credit bureaus and other organizations have created. Credit score algorithms consider multiple aspects of your history, such as your total amount of debt, any derogatory items on your report, and the types of credit accounts you have—both present and past.

What is the most used credit score model? ›

For other types of credit, such as personal loans, student loans and retail credit, you'll likely want to know your FICO® Score 8, which is the score most widely used by lenders.

What is the new credit scoring method? ›

The new model combines alternate open banking data with traditional credit data, which it claims gives lenders a “substantial predictive lift of up to 10% compared to the industry leading VantageScore 4.0 credit score, which itself has up to an 8% lift over conventional scoring models.”

What is credit scoring algorithm? ›

A credit scoring model is a mathematical algorithm that uses various factors to assess the creditworthiness of a borrower. This algorithm takes into account factors such as payment history, credit utilization, length of credit history, types of credit used, and recent credit inquiries.

How is credit score scored? ›

Credit scoring models generally look at how late your payments were, how much was owed, and how recently and how often you missed a payment. Your credit history will also detail how many of your credit accounts have been delinquent in relation to all of your accounts on file.

How to calculate credit score ratio? ›

First, add up all the outstanding balances, then add up the credit limits. Take the total balances, divide them by the total credit limit, and then multiply by 100 to find your credit utilization ratio as a percentage amount.

Top Articles
Latest Posts
Article information

Author: Nathanial Hackett

Last Updated:

Views: 5684

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.