Predictive Modeling – Important Questions that CUs Need to Ask

by Courtney Collier

As you know, the new CECL guidance calls for estimating life of loan losses for the ALLL reserve.  “Life of Loan” — just three simple words — but its implementation is anything but simple. Many factors within your portfolio can, and often do, affect loss estimates.  For example, the age of the loan, borrower’s credit rating, current collateral value, and the forecasted economic conditions, are just a few of the many possible influencers on the likelihood that a loan will result in a loss. The new CECL guidance calls out Probability of Default/Loss Given Default (or PD/LGD) as one of the assorted possible methodologies that could be used for estimating losses for the reserve. Given the complexities involved in meeting the standard for the appropriateness of the reserve estimate, many financial institutions are looking to use predictive loss modeling as their preferred methodology.

Statistical Modeling for credit risk assessment has been used for over a half century by financial institutions and can vary in sophistication, from a simple Excel spreadsheet that merely projects historical losses onto the current portfolio, to more advanced statistical hazard models that estimate the time a loss event is likely to occur and what the remaining balance will be when it does. One of the most common examples of a statistical predictive model you are probably familiar with is credit score modeling provided by the three credit bureaus. Credit bureaus have developed models that use both collected borrower data and certain assumptions jointly in order to predict the likelihood of a borrower’s propensity to default over the next twenty-four months.

Did Predictive Modeling Fail Us in 2008?

While it is a widely accepted fact that any loss prediction is inherently imprecise, some have insisted that the use of predictive modeling is ineffective altogether and argue that the financial meltdown of 2008 substantiates this position.  Still others argue that it was not the models themselves that were exclusively culpable. Sharp increases in high risk lending and loose underwriting practices contributed to deteriorating loan quality. The models being used to predict losses were not calibrated for this level of change.  This scenario led to the use of inappropriate data being pumped into insufficient models that produced inaccurate risk assessments, which ultimately resulted in inadequate reserve levels that couldn’t withstand the losses that were emerging.

Most did not know at that time to ask: “What are you factoring for in your models?  Have the current lending conditions changed from the historical reference period? What assumptions are you making in your models and are they still valid? How long will these loans be active in the portfolio and what kind of economic season will they have to survive through?”  The list could go on and on.

Learning From the Past

Under CECL, FASB calls for answers to those very questions. NCUA examiners and auditors will expect credit unions to maintain adequate documentation of all areas of the model and possess the ability to defend their outputs, including outsourced third-party risk models.  This will make it extremely important for credit unions who are shopping risk models from third-party vendors to ask the right questions and perform due diligence on the models they use to produce these estimates.   A revealing survey performed by CU Direct in 2016 showed that 50% of credit unions that responded plan to outsource the PD/LGD calculation to help meet the challenges of the new guidance.

Cox Regression, Collinearity, Oh My…

If your credit union is looking to work with a vendor to address these challenges, you may be disoriented by the dizzying usage of statistical terms and theories being thrown at you in a demo or sales presentation, terms such as, Cox Regression, collinearity, conditional distribution, stochastic process, etc.  In most cases, this is not an attempt by the vendors to sound smart or be impressive.  Financial statistical modeling is very complicated and requires an understanding of some pretty advanced concepts and ideas.  When you listen to two or more presentations on this subject, it may become difficult to tell one product from the other. They all attempt to do the same thing; they just go about it in slightly different ways.  However, there is one important question that you should ask that will likely draw the distinction:

Do you validate the data?

The model can only be as good as the data being processed through it.  All too often, statisticians use data that is not validated or normalized to build their model against.  This will inevitably lead to inaccurate data being fed into the model, resulting in faulty outputs on which reserve estimates are based.  If this isn’t bad enough, poorly derived reserve levels can lead to a host of additional issues, such as solvency assessments, compliance statuses and resource allocations.

Having worked with over 100 credit unions and their data over the course of my career, I know how common it is for data inaccuracies to exist if it has not been thoroughly validated.  Processing “bad data” through an advanced PD/LGD model could go undiscovered for years.  Think about the last five years at your credit union.  How many data issues have you run into during this time?  Things like, the wrong origination date or maybe charge-offs not being recorded properly. How difficult was it to fix and how long did it take? Now, imagine this erroneous data has been provided to your PD/LGD vendor on a monthly or quarterly basis for the last two years.  Will this vendor assist you by creating and executing scripts to update the bad data with good?  Or, will they enforce referential integrity on prior loan records when new updated data is introduced?  Further, will you be on your own – left having to make the decision to either live with the bad data or just purge it all and start from scratch?  These are all questions that should be considered when shopping for a product like this.

Additionally, it may be more than just your data that you should be concerned about.  Often, statisticians will use aggregated, lender agnostic loan performance data to produce larger sample sizes, in order to establish the assumptions and linear correlations to loss events that will ultimately be baked into the model.  This is an appropriate technique if applied correctly.  However, if the aggregated data used in this initial analysis is not validated, every subsequent output could be significantly flawed.  It will be important to ask the vendor “Have you validated the data that you used to determine the parameters of your model?”

Credit unions will be expected to document, validate, defend and support the reserve recorded on the balance sheets.  Credit unions must hold their vendors to this same standard when it comes to the outputted values that they provide.  Credit unions should request and review the vendor’s data validation and model management program and policies, to ensure that adequate support is available and policies are sufficient to meet the new guidance.

Discover how Lending Insights can help your credit union meet the new CECL guidelines.

About the Author

Courtney Collier
Courtney Collier, Product Manager, Analytic Products, CU Direct