At Nyca, we get quite excited about credit. Extension of credit and the magic of leverage helped catapult the American economy to be the largest in the world, and the U.S. system has been quite effective for decades. However, like many areas of financial services, credit is undergoing a disruptive period. The availability of new and diverse data coupled with evolving modeling techniques and new forms of distribution have opened up an exciting new era. The following is our third and final installment of a three-part credit-focused series. If this is your first read in the series, I would start by reading FICO’s Ubiquity and Lender Complacency, followed by Leading with Context.
As alluded to in my previous post on Leading with Context, there are two major components to building a modern credit approach that do not exist today. One is a clean and reliable real-time credit data set (the “Fourth Bureau”) and the second is a data and modeling platform allowing for efficient implementation of ever-evolving modern data sources.
The problems in credit data and modeling are vast and complex. I can’t say I have a specific solution in mind and there may be multiple components that join together to solve the problem. Rather than being prescriptive, I will focus on pain points and let the entrepreneur community figure out the best ways to solve the problem!
The Fourth Bureau
We’re at a unique moment in consumer credit where the “alternative data” world is set to merge with conventional bureau data. Alternative data, broadly meaning non-credit bureau data, is viewed with skepticism, having not been tested over economic cycles. Bureau data has had well-documented swings since its launch in 1989, giving it the aura of reliability. However, with the continued degradation of bureau components—especially those heavily weighted in FICO Score 8—there is real reason to believe that new data sources will become mainstream. This includes information and signals not currently captured, like those derived from bank account transactions, payroll, rent, utilities, credit builder services, BNPL, marketplace platforms, etc.
The mainstreaming of alternative data will be utilized for the entire credit spectrum. Subprime lenders have been actively incorporating data from alternative bureaus such as Innovis, Lexis Nexis, CoreLogic, etc. for many years. But to my knowledge, no prime lender is investing heavily in alternative data to materially revamp their underwriting approach. Their stance is that bureau data has produced consistent historical results, and there is no catalyst for change. The old adage, “if it ain’t broke, don’t fix it” rings true until it doesn’t.
Near-prime and subprime lenders have already experienced increased losses resulting from degrading traditional bureau data. Credit builder services influencing key variables like credit line utilization and bureau-invisible credit extension like BNPL (not currently furnished) lead to a confusing and incomplete picture of an individual’s creditworthiness.
Risk expert gossip is that 700-720 FICO (low prime) isn’t as creditworthy now as it was in the past, and there could be real weakness in borderline credit through a period of economic stress. This weakness in low prime could partly be due to “grade inflation” as the economy hummed along, but it is also tied to degradation of credit bureau data.
Thus, many are trusting bureau files a lot less and seeking an undoctored source of truth. One key benefit to most alternative data is that it is verifiable and therefore more reliable for underwriting purposes. These direct sources also provide as close to a current view of financial health as possible, adding to their attractiveness.
At this point, lenders realize that at least some bureau data needs to be complemented by “alternative” feeds. However, current infrastructure for the vast majority of lenders is built exclusively around bureau data, and even making a change to bureau-derived signals can take at least a year. Back-testing models is an extremely time-consuming exercise, especially without modern tools. What does a lender do in the meantime? The most likely reaction is material shrinking of the credit box if losses start to tick up, hurting credit availability and business profitability.
Components for Success
A standardized means to access a broad swath of alternative data is needed to bring underwriting into the modern age. This “fourth bureau” has a daunting challenge: earning the confidence of lenders and borrowers. Here are some of the critical issues that need to be solved:
Normalized data: Today, through the FCRA, there is a standardized way to furnish credit information. Lenders are required to package issuance and performance data into a Metro 2 file (as discussed in this post). While the standard is somewhat difficult to maintain, the format serves its purpose. Alternative data does not have an equivalent standardized schema, and even basic bank checking accounts provide non-conforming reporting based on institutionally-established standards. While all bank accounts show debits and credits, Chase and Wells Fargo may use a different term or category for a payroll deposit, Venmo transfer, mortgage payment, etc. Plaid, Finicity, MX, Yodlee, and others have come a long way in understanding data from disparate financial institutions, but conformity is still not there. Open Banking could bring standardization, but the U.S. still lags behind the rest of the world in instituting such structure. These problems exist in other key forms of data access as well, including payroll, rent (only positive furnishing, meaning bad behavior is not available), BNPL, etc. This is a gargantuan task—if CRAs do not step up to solve this problem, an expansion of FCRA may be necessary. However, I won’t hold my breath for a legislative change to drive mainstreaming of alternative data.
Pull versus push / consumer permissioning: Traditional bureau data reporting is not consumer-permissioned today. Lenders furnish information by law, and bureaus maintain a record. However, because alternative data is definitionally outside the mandated scope of credit reporting, many valuable data sources require consumer permission to share this information with lenders. This “pull” activity serves as a barrier to widespread market adoption. Many a product manager cringes at the idea of requesting bank account, payroll, or other credentials from borrowers. Application funnel fallout can be severe, increasing a lender’s CAC. Open Banking once again could be a big help here, bringing in passwordless access to transaction information. Yet today, much of this information can only be accessed through a friction-filled experience.
Privacy: In conjunction with consumer permissioning, consumer privacy and use of consumer data continues to be a hot-button issue. Alternative data is more intrusive to a consumer’s behaviors; will borrowers accept and tolerate such transparency in order to access more efficient credit? Can a more modern permissioning system be designed to provide consumers more control over use of their financial data?
Real-time: Accessing “real time” data on consumer financial health would be a big step forward. Bureau data can lag 45 days or more depending on the bureau report being accessed and lenders’ furnishing timelines. Receiving an up-to-date snapshot of financial activity can help drive confidence in credit decisions, demonstrate positive behavior, and avoid first-party fraud like loan stacking. This all sounds great, but there are a few major challenges. How can there be consistency / ubiquity in real-time access? How does consumer permissioning impact such access?
Data infrastructure: Bureau data today is provided at a moment in time. The future of credit data is likely much more of a time-based exercise, focused on consumer behavior over a defined period. Lenders today do ingest and maintain credit files over time (periodic pulls), but this is a far cry from the quantum of alternative data that could be utilized in modern underwriting. Thus, there are two major challenges here: how does a fourth bureau absorb and maintain such a large data set, and how does a lender adapt its own infrastructure to manage much more dynamic data? More on the latter is described below.
A New Stack
A more modern and diverse set of underwriting data is both a great opportunity and a daunting challenge. Most lenders today have developed their own models and underwriting technology in-house. However, the scale of the challenge in incorporating diverse, less consistent time-series data is not something most institutions will be capable of tackling efficiently in-house.
Data infrastructure for risk functions in general could use an upgrade. I have yet to speak to a risk leader who proclaims his/her team can integrate new data sources in days. More typically, I hear horror stories about 6-12 month timelines to complete access, normalization, model signal development, backtesting, and implementation of these sources. These leaders all understand their vulnerabilities with such a drawn-out timeline for critical data, but their hands are tied. Systems are typically layered over time, focused on immediate needs, with little regard to objectives around future systems infrastructure needs.
I am not an engineer so will not claim to have deep expertise in how modern risk infrastructure should be designed, but the following issues will need to be addressed:
Absorb disparate data sources: Setting up the proper cloud-based infrastructure for new data ingestion, new source integration, and expedient data recall is table stakes for all modern data platforms, and risk is no exception. Borrowing norms from other big data realms like e-commerce, supply chain, digital marketing, etc. should help accelerate infrastructure building.
Data normalization: While the dream would be for industry players to standardize data schemas, there will be a real need to build a normalization engine for nearly every category of alternative data. For example, a lender’s understanding of what constitutes base salary versus overtime versus incentive bonuses is critical to more precisely evaluating ability to pay. A Fourth Bureau has a real opportunity to establish and enforce a common data model.
Model development and testing: Underwriting has been and continues to be a space where lenders think they have an edge on competitors who take a purely algorithmic/data science approach. Years of experience combined with complex market niches create real barriers to entry; even the world’s greatest data science team cannot learn the nuances of equipment finance or subprime auto lending in a weekend, nor are they likely to appreciate how much the macroeconomic environment influences lending decisions. Thus, modern underwriting cannot be a pure data solution that proclaims it can outperform market intelligence. It must be a complementary tool for risk professionals to accelerate model development. My own view is that lowering the barriers to building machine learning models and ability to iterate on performance while keeping the keys to the car with subject-matter experts is likely the most desirable outcome. Better analytics and monitoring will be critical to convincing established lenders to make the transition.
Multiple model hosting: The basis for our thesis around contextual credit is that matching relevant data to a specific offering or circumstance can add significant value. The resulting implication is that more models need to be built and monitored. With today’s home-built infrastructure, this is nearly impossible for risk teams with limited resources to accomplish. Modern architecture can assist with faster generation, vetting, implementation, monitoring, and optimization.
Searching for Founders
I’m incredibly excited about the evolution of data and risk systems over the coming years. To be clear, I do not agree with the view that a new generation of ML models will materially increase credit approvals across the board. Credit models today are already thoughtful and accurate for specific purposes. However, current models lack the flexibility and agility required to compete in a modern digital world. It’s time for great entrepreneurs and companies to solve these problems in a scaled manner.
If you have comments or ideas on how the fourth bureau emerges alongside modern data infrastructure, please reach out. I would love to collaborate!
We have found that real time credit adjustments based on bank data analysis improves outcomes by more than 50 percent in comparison to static models. It isn't just in providing credit to low fico borrowers that these models shine it's also very effective in avoiding losses in a changing economic environment.
It also puts banks in a better competitive position vs Fintech if they adopt it for their internal customers.