Credit Scoring in R 101 (2024)

[This article was first published on jkunst.com: Entries for category R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post we’ll fit some predicitve models in (well know) data bases, and evalute the performance of each model. Disclaimer1: for simplicity the predictive variables are treating without apply any transformation to get a better performance or stability, etc. We’ll use two datas to evaluate the performances of the models. Both data have categorical and continous variables and we’ll use 50-50 split to have a train and test data. The datas are:

German Credit Data

## SCORE SAMPLE BR KS AUCROC Gain10 Gain20 Gain30 Gain40 Gain50## 2 SCORE_Logistic train 0.32 0.54 0.83 0.25 0.48 0.64 0.74 0.82## 4 SCORE_CTree train 0.32 0.40 0.76 0.53 0.53 0.61 0.87 0.87## 6 SCORE_SLNNET train 0.32 0.64 0.90 0.30 0.54 0.73 0.82 0.89## 8 SCORE_LDA train 0.32 0.54 0.83 0.24 0.47 0.63 0.76 0.82## 1 SCORE_Logistic test 0.28 0.46 0.78 0.23 0.43 0.58 0.72 0.82## 3 SCORE_CTree test 0.28 0.37 0.74 0.50 0.50 0.57 0.87 0.87## 5 SCORE_SLNNET test 0.28 0.42 0.77 0.21 0.40 0.58 0.68 0.79## 7 SCORE_LDA test 0.28 0.47 0.79 0.23 0.46 0.57 0.73 0.82daux <- subset(data1, SAMPLE == "test")daux_roc <- ldply(str_pattern(names(daux),"SCORE"), function(score){ perf <- performance(prediction(daux[[score]], daux$GB), "tpr","fpr") df <- data.frame(x = unlist(perf@"x.values") , y = unlist(perf@"y.values")) df$score <- score df})ggplot(daux_roc) + geom_line(aes(x,y, color = score), size = 1.2) + scale_color_manual('',values=brewer.pal(length(unique(daux_roc$score)), "RdBu")) + geom_path(data=data.frame(x = c(0,1), y = c(0,1)), aes(x,y), colour = "gray", size = 1) + scale_x_continuous("False Positive Rate (1 - Specificity)", labels = percent_format(), limits = c(0, 1)) + scale_y_continuous("True Positive Rate (Sensivity or Recall)", labels = percent_format(), limits = c(0, 1)) + theme(legend.position = "top") + ggtitle("ROC Curves for German Credit Data (validation)")

Now we can plot the distributions of good/bads in each model. We'll transform the data whith melt function and then plot faceting by score.

daux <- subset(data1, SAMPLE == "test", select = c("GB", "SCORE_Logistic", "SCORE_CTree","SCORE_SLNNET","SCORE_LDA"))daux <- melt(daux, id = "GB")ggplot(daux, aes(x=value, fill = factor(GB))) + geom_density(alpha = 0.6, size = .75) + facet_wrap(~variable, ncol=2) + scale_fill_manual(values = brewer.pal(3, "Dark2")) + theme(legend.position = "none", axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank(), plot.margin = unit(rep(0.5, 4), "lines"), title = element_text(size = 9))

Bankloan Binning data

## SCORE SAMPLE BR KS AUCROC Gain10 Gain20 Gain30 Gain40 Gain50## 2 SCORE_Logistic train 0.25 0.54 0.84 0.31 0.51 0.67 0.79 0.89## 4 SCORE_CTree train 0.25 0.52 0.84 0.34 0.50 0.82 0.82 0.90## 6 SCORE_SLNNET train 0.25 0.56 0.86 0.32 0.52 0.68 0.82 0.89## 8 SCORE_LDA train 0.25 0.52 0.84 0.31 0.51 0.66 0.78 0.88## 1 SCORE_Logistic test 0.25 0.53 0.84 0.31 0.51 0.68 0.78 0.87## 3 SCORE_CTree test 0.25 0.45 0.79 0.30 0.51 0.76 0.76 0.84## 5 SCORE_SLNNET test 0.25 0.51 0.83 0.30 0.51 0.66 0.77 0.85## 7 SCORE_LDA test 0.25 0.50 0.83 0.31 0.51 0.67 0.77 0.85

Do you want to comment about the results? If you are interesting in this topic reproduce this example. And if you have questions and/or improvements or want to know more details for the code please comment.

References

To leave a comment for the author, please follow the link and comment on their blog: jkunst.com: Entries for category R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

FAQs

What is the math of credit scoring? ›

FICO Scores are calculated using many different pieces of credit data in your credit report. This data is grouped into five categories: payment history (35%), amounts owed (30%), length of credit history (15%), new credit (10%) and credit mix (10%).

Tell Me More ›

Why do I only have one credit score? ›

Your credit scores may vary according to the credit scoring model used, and may also vary based on which credit bureau furnishes the credit report used for the data. That's because not all lenders and creditors report to all three nationwide credit bureaus. Some may report to only two, one or none at all.

Learn More ›

What is the credit scoring model for loans? ›

Credit scoring models

These models leverage statistical algorithms and historical credit data to evaluate the likelihood of a borrower defaulting on a loan or credit obligation. The primary objective is to provide lenders with a quantitative measure that helps them make informed decisions about extending credit.

Show Me More ›

Is there a formula for credit score? ›

Your credit score, which commonly refers to your FICO score, is calculated based on five factors: payment history, amount owed, length of credit history, new credit, and credit mix. Although FICO does not reveal its specific calculation, it does report the main factors used to calculate its credit scores.

See Details ›

What are the 4 R's of credit scoring? ›

As [1] summarised, credit scoring is functional in four scenarios denoted by the acronym 4R, namely Risk, Response, Revenue and Retention.

Get More Info ›

What is the most commonly used credit scoring model answer? ›

The most recognized credit score is the FICO score, which comes from the Fair Isaac Company. FICO has more than 50 different versions of your score that it sends to lenders. The score may change, depending on which company asks and what was important to that company in calculating your score.

Know More ›

What are the 5 C's of credit? ›

Called the five Cs of credit, they include capacity, capital, conditions, character, and collateral. There is no regulatory standard that requires the use of the five Cs of credit, but the majority of lenders review most of this information prior to allowing a borrower to take on debt.

Find Out More ›

What is the credit score algorithm? ›

It is calculated using an algorithm—a mathematical formula that credit bureaus and other organizations have created. Credit score algorithms consider multiple aspects of your history, such as your total amount of debt, any derogatory items on your report, and the types of credit accounts you have—both present and past.

Keep Reading ›

What is the most used credit score model? ›

For other types of credit, such as personal loans, student loans and retail credit, you'll likely want to know your FICO^® Score 8, which is the score most widely used by lenders.

Learn More Now ›

What is the new credit scoring method? ›

The new model combines alternate open banking data with traditional credit data, which it claims gives lenders a “substantial predictive lift of up to 10% compared to the industry leading VantageScore 4.0 credit score, which itself has up to an 8% lift over conventional scoring models.”

Find Out More ›

What is credit scoring algorithm? ›

A credit scoring model is a mathematical algorithm that uses various factors to assess the creditworthiness of a borrower. This algorithm takes into account factors such as payment history, credit utilization, length of credit history, types of credit used, and recent credit inquiries.

How is credit score scored? ›

Credit scoring models generally look at how late your payments were, how much was owed, and how recently and how often you missed a payment. Your credit history will also detail how many of your credit accounts have been delinquent in relation to all of your accounts on file.