Credit Scoring Tools: the scoringTools package (2024)

This package has been developed as part of a CIFRE PhD, a special PhDcontract in France which is for the most part financed by a company.This company subsequently gets to choose which subject(s) aretackled.

This research has been financed by Crédit Agricole Consumer Finance(CA CF), subsidiary of the Crédit Agricole Group which provides allkinds of banking and insurance services. CA CF focuses on consumerloans, ranging from luxury cars to small electronics.

In order to accept / reject loan applications more efficiently (bothquicker and to select better applicants), most financial institutionsresort to Credit Scoring: given the applicant’s characteristics he/sheis given a Credit Score, which has been statistically designed usingpreviously accepted applicants, and which partly decides whether thefinancial institution will grant the loan or not.

Context

In practice, the statistical modeler has historical data about eachcustomer’s characteristics. For obvious reasons, only data available atthe time of inquiry must be used to build a future applicationscorecard. Those data often take the form of a well-structured tablewith one line per client alongside their performance (did they pay backtheir loan or not?) as can be seen in the following table:

Job	Habitation	Time_in_job	Children	Family_status	Default
Craftsman	Owner	10	0	Divorced	No
Technician	Renter	20	1	Widower	No
Executive	Starter	5	2	Single	Yes
Office employee	By family	2	3	Married	No

Formulation

The variable to predict, here denoted by , is an active researchfield and we will not discuss it here. We suppose we already have abinary random variable $Y$ from whichwe have $n$ observations $\mathbf{y} = (y_i)_1^n$.

The $d$ predictive features, herefor example the job, habitation situation, etc., are usuallysocio-demographic features asked by the financial institutions at thetime of application. They are denoted by the random vector $\boldsymbol{X} = (X_j)_1^d$ and as for$Y$ we have $n$ observations $\mathbf{x}=(x_i)_1^n$.

We suppose that observations $(\mathbf{x},\mathbf{y})$ come from anunknown distribution $p(x,y)$ whichis not directly of interest. Our interest lies in the conditionalprobability of a client with characteristics $\boldsymbol{x}$ of paying back his loan,i.e.$p(y|\boldsymbol{x})$, alsounknown.

In the context of Credit Scoring, we historically stick to logisticregression, for various reasons out of the scope of this vignette. Thelogistic regression model assumes the following relation between $\boldsymbol{X}$ (supposed continuous here)and $Y$: \[\ln \left(\frac{p_{\boldsymbol{\theta}}(Y=1|\boldsymbol{x})}{p_{\boldsymbol{\theta}}(Y=0|\boldsymbol{x})}\right) = (1, \boldsymbol{x})'{\boldsymbol{\theta}}\]

We would like to have the ‘‘best’’ model compared to the true $p(y|\boldsymbol{x})$ from which we onlyhave samples. Had we access to the true underlying model, we would liketo minimize, w.r.t. ${\boldsymbol{\theta}}$, $H_{\boldsymbol{\theta}} = \mathbb(E)_{(X,Y) \simp}[\ln(p_{\boldsymbol{\theta}}(Y|\boldsymbol{X}))]$. Since thisis not possible, we approximate this criterion by maximizing, w.r.t.$\theta$, the likelihood $\ell({\boldsymbol{\theta}};\mathbf{x},\mathbf{y})= \sum_{i=1}^n\ln(p_{\boldsymbol{\theta}}(y_i|\boldsymbol{x}_i))$.

In R, this is done by fitting a model to the data:

library(scoringTools)scoring_model <- glm(Default ~ ., data = lendingClub, family = binomial(link = "logit"))

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

We can now focus on the regression coefficients $\boldsymbol{\theta}$:

## (Intercept) Amount_Requested ## 5.446254e-01 5.198134e-06 ## Loan_Purposecredit_card Loan_Purposedebt_consolidation ## -2.161336e-01 -4.537949e-01 ## Loan_Purposeeducational Loan_Purposehome_improvement ## 1.858680e-01 -6.656963e-01 ## Loan_Purposehouse Loan_Purposemajor_purchase ## -1.278938e+00 -1.726151e+00 ## Loan_Purposemedical Loan_Purposemoving ## -7.204768e-01 -4.125148e-01 ## Loan_Purposeother Loan_Purposerenewable_energy ## -1.044591e-01 -1.902471e+01 ## Loan_Purposesmall_business Loan_Purposevacation ## -7.710864e-01 -8.271925e-01 ## Loan_Purposewedding Loan_Length ## -4.670372e-01 -8.072343e-03 ## Debt_To_Income_Ratio Home_OwnershipMORTAGE ## 4.673087e-04 -9.385231e-01 ## Home_OwnershipMORTGAE Home_OwnershipMORTGAG ## -4.822148e-02 -1.559521e+00 ## Home_OwnershipMORTGAGE Home_OwnershipMORTGGE ## -9.395086e-01 -1.053046e+00 ## Home_OwnershipMOTGAGE Home_OwnershipMRTGAGE ## -1.320655e-01 7.277748e-02 ## Home_OwnershipORTGAGE Home_OwnershipOTHER ## -4.190851e-01 -2.067367e+01 ## Home_OwnershipOWN Home_OwnershipRENT ## -1.052071e+00 -7.275610e-01 ## Open_CREDIT_Lines Revolving_CREDIT_Balance ## -1.537008e-02 5.409573e-06 ## Inquiries_in_the_Last_6_Months Monthly_Income ## -5.478806e-02 1.679455e-05 ## Employment_Length StateAL ## 2.190715e-02 -9.302735e-01 ## StateAR StateAZ ## -2.001419e+01 -8.498573e-01 ## StateCA StateCO ## -1.324136e+00 -8.503792e-01 ## StateCT StateDC ## -1.006930e+00 -8.278092e-01 ## StateDE StateFL ## -1.813406e+01 -7.749171e-01 ## StateGA StateHI ## -1.658919e+00 -8.162135e-01 ## StateIA StateIL ## -2.456610e+00 -8.436800e-01 ## StateIN StateKS ## -1.028000e+00 -1.184442e+00 ## StateKY StateLA ## -2.489800e+00 -1.522087e+00 ## StateMA StateMD ## -2.233885e+00 -6.024556e-01 ## StateMI StateMN ## -4.866130e-02 -1.755743e+00 ## StateMO StateMS ## -1.490269e+00 -1.272611e+00 ## StateMT StateNC ## -2.953696e-01 -1.296718e+00 ## StateNH StateNJ ## -9.519204e-01 -1.139183e+00 ## StateNM StateNV ## -5.655698e-01 -1.188136e+00 ## StateNY StateOH ## -6.806718e-01 -7.549876e-01 ## StateOK StateOR ## -2.235123e+00 -1.849957e+00 ## StatePA StateRI ## -8.662468e-01 -1.162867e-01 ## StateSC StateSD ## -1.620496e+00 1.488559e+01 ## StateTX StateUT ## -1.195268e+00 -1.546485e+00 ## StateVA StateVT ## -8.237064e-01 3.419687e-01 ## StateWA StateWI ## -1.026220e+00 -5.139766e-01 ## StateWV StateWY ## -8.625548e-01 -1.224568e+00 ## Interest_Rate FICO_Range645-649 ## 3.282162e-02 -1.933897e+01 ## FICO_Range650-654 FICO_Range655-659 ## 2.331851e+01 2.633920e+00 ## FICO_Range660-664 FICO_Range665-669 ## 9.115198e-01 5.946684e-01 ## FICO_Range670-674 FICO_Range675-679 ## 1.004201e+00 9.227298e-01 ## FICO_Range680-684 FICO_Range685-689 ## 7.418759e-01 1.059893e+00 ## FICO_Range690-694 FICO_Range695-699 ## 6.573794e-01 -1.914143e+01 ## FICO_Range700-704 FICO_Range705-709 ## -1.916224e+01 -1.916976e+01 ## FICO_Range710-714 FICO_Range715-719 ## -1.907579e+01 -3.016283e+01 ## FICO_Range720-724 FICO_Range725-729 ## -1.890553e+01 -3.031735e+01 ## FICO_Range730-734 FICO_Range735-739 ## -1.903904e+01 -1.898787e+01 ## FICO_Range740-744 FICO_Range745-749 ## -1.901698e+01 -1.899618e+01 ## FICO_Range750-754 FICO_Range755-759 ## -1.904364e+01 -1.898244e+01 ## FICO_Range760-764 FICO_Range765-769 ## -1.883702e+01 -1.871614e+01 ## FICO_Range770-774 FICO_Range775-779 ## -1.888484e+01 -1.870410e+01 ## FICO_Range780-784 FICO_Range785-789 ## -1.877299e+01 -1.889972e+01 ## FICO_Range790-794 FICO_Range795-799 ## -1.882304e+01 -1.947507e+01 ## FICO_Range800-804 FICO_Range805-809 ## -1.877842e+01 -3.202823e+01 ## FICO_Range810-814 FICO_Range815-819 ## -1.877796e+01 -1.899147e+01 ## FICO_Range820-824 FICO_Range830-834 ## -1.776919e+01 -1.902438e+01 ## Age ## -3.512107e-03

and the deviance at this estimation of $\boldsymbol{\theta}$: [1] 1103.43

From this, it seems that Credit Scoring is pretty straightforwardwhen the data is at hand.

Conceptual problems of current approaches to Credit Scoring

Nevertheless, there are a few theoretical limitations of the currentapproach, e.g.:

We don’t observe rejected applicants’s performance, i.e.we don’thave observations $y_i$ forpreviously rejected applicants;
The performance variable $Y$ mustbe constructed using historical data but we can’t wait for all currentcontracts to end, that’s why financial institutions usually consider adefaulting client to be someone failing to pay two consecutiveinstallments;
Credit risk modelers often ‘‘discretize’’ the input data $\boldsymbol{X}$, that is to say continuousvariables are transformed into categorical variables corresponding tointervals of the support of $\boldsymbol{X}$ and categorical variablesmight see their values regrouped to form a categorical variable withless values (but whose coefficients are ‘‘easier’’ to estimate). Up tonow, there was no theoretical grounds to do so and no uniformly bettermethod;
Credit risk modelers have always sticked to logistic regressionwithout knowing whether it is somewhat ‘‘close’’ to the true underlyingmodel.

Problems tackled in this package

Two problems have been tackled so far in the Credit Scoringframework:

Reject Inference,
‘‘Quantization’’ of continuous (discretization) and qualitative(grouping) features,

Other packages

We released two other packages:

Package glmdisc for‘‘Quantization’’ of continuous (discretization) and qualitative(grouping) features and interactions amongcovariates,
Package glmtree for‘‘Segmentation’’ of clients into subpopulations with differentscorecards: logistic regression trees.

Other packages focus on Credit Scoring, see e.g.this review paper.

Credit Scoring Tools: the scoringTools package (2024)

FAQs

What is a credit scoring tool? ›

Credit scoring models

These models leverage statistical algorithms and historical credit data to evaluate the likelihood of a borrower defaulting on a loan or credit obligation. The primary objective is to provide lenders with a quantitative measure that helps them make informed decisions about extending credit.

Tell Me More ›

Which credit scoring model is best? ›

The FICO scoring model is an algorithm that produces what is considered the most reliable credit scores. About 90% of lenders use FICO's model to evaluate candidates for credit.

Learn More ›

What is the most commonly used credit scoring system? ›

FICO scores are the most widely used credit scores in the U.S. for consumer lending decisions. There are multiple FICO credit scoring models, each of which uses slightly different criteria.

Show Me More ›

What is scorecard rejection? ›

Reject inference is a method for improving the quality of a credit scorecard by incorporating data from rejected loan applications. Bias can result if a credit scorecard model is built only on accepts and does not account for applications rejected because of past denials for credit or unknown nondefault status.

See Details ›

What is a good credit score checker? ›

You can start by going to the three major credit bureaus, Equifax, Experian, and TransUnion first by logging on to AnnualCreditReport.com to check your report for free. Each agency gives you access to your report once every 12 months.

Get More Info ›

How is credit scoring done? ›

A credit score is a number that depicts a consumer's creditworthiness. FICO scores range from 300 to 850. Factors used to calculate your credit score include repayment history, types of loans, length of credit history, debt utilization, and whether you've applied for new accounts.

Know More ›

How to get a perfect credit score? ›

How to get a perfect credit score

Average credit utilization ratio: 4%
Total late payments on credit report: 0.
Average age of oldest account: 30 years.
Average number of credit cards: 6.
Average credit card balance: $2,500.
Average auto loan balance: $17,000.
Average mortgage balance: $205,000.

More items...

Sep 14, 2023

Find Out More ›

What is a good FICO score? ›

A good credit score is generally 690 to 719 on the 300-850 scale commonly used for FICO scores and VantageScores. Amanda Barroso is a personal finance writer who joined NerdWallet in 2021, covering credit scoring. She has also written data studies and contributed to NerdWallet's "Smart Money" podcast.

Keep Reading ›

What scoring system do most lenders use? ›

For the majority of lending decisions most lenders use your FICO score. Calculated by the data analytics company Fair Isaac Corporation, it's based on data from credit reports about your payment history, credit mix, length of credit history and other criteria.

Learn More Now ›

What are the 4 pillars of scorecard? ›

The balanced scorecard involves measuring four main aspects of a business: Learning and growth, business processes, customers, and finance.

Find Out More ›

How do you complete a scorecard? ›

Completing a Scorecard

Name of Competition.
Date.
Time (not always necessary)
Players Name (in Player A)
Handicap (this may have a decimal place after it e.g. 22.4)
Strokes Received (handicap rounded up or down to the nearest whole number)

What makes a good scorecard? ›

Usually, an effective scorecard will have between 30-35 KPIs at the top level. But it's important to find a balance that works for your organisation and its strategic objectives. There needs to be enough KPIs to fully track the business's progression towards strategic goals.