How Can Fintech Lending Companies Detect Possible Defaulters?

Machine learning is changing the perspective of credit risk analysis, making it possible to predict default probability more accurately

How Can Fintech Lending Companies Detect Possible Defaulters?

New ideas nearly always evolve from new needs: in prehistory, the discovery of fire was the outcome of the need for light and heat, while the creation of bread was the product of hunger. This is also true if we transfer this allegory to financial markets. The creation of money emerged from the need to find a medium of exchange other than the bartered goods themselves. Moreover, access to finance is a necessity that, as a service, has changed throughout history.

Let’s fast-forward to the 2007 mortgage crisis, which generated a global chain reaction that triggered the Great Recession, a period of high economic uncertainty and significant restrictions on financing given the tightening of credit assessment criteria. Faced with this situation, and making use of the available technology, several agents set out to find a solution to a need that is so basic and yet so complex at the same time: access to money.

The emergence of fintech companies

Just as e-commerce was created and disseminated –from an activity as primitive as buying and selling items– fintech emerged to popularize digital financial services. Since 2015, fintech companies have captured the attention of industry and academia owing to the level of interactions in this ecosystem.

Fintech encompasses many types of services: payment apps, loans, financial education, trading, and crowdfunding, among others. Given the breadth of the term, my research focuses on credit access services, specifically the dynamics of peer-to-peer (P2P) lending.

P2P, as it is known in modern jargon, is a dynamic based on trust. In the case of LendingClub, one of the representative platforms of this dynamic, its success has been driven by mistrust in the formal financial system and the need to generate returns or find liquidity. Ingenuity brought together financial services and Internet-based technology, creating a market for individuals who had been excluded from traditional banking.

Although this dynamic is extremely interesting and provides insight into the social and economic context, for my research it is more important to understand how these new business models work and what it is that they do so well to have made the fintech ecosystem grow explosively.

The propagation of the fintech ecosystem has generated numerous benefits for the economically active unbanked population, particularly considering that it facilitated many payment systems and access to cash. From a credit perspective, it has allowed financing for young people with no credit history and even people with bad credit histories. This premise brings us back to the research questions that shape my doctoral thesis. What are the criteria for an acceptable level of credit risk in individuals who are not attractive for traditional banking? How are participants in the P2P dynamic screened? Does the interest rate match the level of risk assumed by investors and the platform’s reputational risk?

All these concerns have led to the use of various tools for identifying every detail that gives shape and meaning to this dynamic of access to finance. Credit risk analysis and assessment models are a formal requirement for financial institutions, but how can we know what their purpose is in a fintech that does not raise funds and is not required to maintain contingency reserves?

Understanding credit risk

Traditionally, logistic regressions, probit, and LDA (linear discriminant analysis) are implemented in risk analysis. These models can be adapted to the needs of each institution, taking into account the policies of Value at Risk, a concept that quantifies the exposure to losses assumed by an institution in its operations.

Regression models seek to quantify the relationship between dependent and independent variables, and, in their simplest structure, identify the characteristics of the group of customers who defaulted, so as to generate a structure for what could turn out to be a “bad prospective customer.” The data in these models is generally linked to credit history and other information from the banking system. However, this leads to the question of what to do with a business that accepts customers whose information does not meet bank requirements, and how to model the probability of default.

Several research undertakings have addressed these questions, and, although there is no conclusive answer, there are multiple proposals that would generate value for any startup looking for its spot in the P2P ecosystem. My doctoral research, with the preliminary title “Expert Systems for Credit Risk in Fintech Lending,” focuses on answering these questions.

The power of algorithms

I first evaluated the LendingClub database using traditional models, but the results were unsatisfactory. This preliminary analysis was indispensable when seeking to propose other approaches, since it offered an initial perspective of what we could find later with more specialized tools. As a result, I turned to machine learning algorithms, using a family of algorithms that has become quite popular in the world of data science. Gradient Boosting algorithms have been employed in diverse databases with a structure similar to that of credit. Conceptually, we can understand an algorithm as a set of systematic operations focused on a specific task.

In this case, there is a set of individuals who meet certain characteristics pertaining to one or more classes. These algorithms do a very efficient job of classifying and predicting the class of individuals, especially in the presence of databases with millions of observations, as is the case of LendingClub.

The family of Gradient Boosting algorithms includes:

  • AdaBoost
  • XGBoost
  • LightGBM
  • CatBoost

Another advantage of these algorithms is the transparency of interpretation, which makes it possible to identify relevant characteristics of the underlying database relationships and the predictive quality of the algorithm with new information. The hyperparameter calibration of these algorithms is a rather challenging task, but there are alternatives to move towards an efficient model. Calibration using AutoML is a tool worth trying in data science exercises, since it allows you to define the best hyperparameters depending on whether the goal of the algorithm is classification or regression.

My research findings determine accuracy percentages of above 90% for Gradient Boosting algorithms, an extremely desirable result for predicting default events.

These results can be an important tool for the start-up activities of new Latin American ventures, since the definition of credit models (and of analysis in general) is even more challenging given the obvious scarcity of historical information. 

The author is a Ph.D. in Financial Science student at EGADE Business School.

Go to research
EGADE Ideas
in your inbox