Scoring for credit risk: polytomous response variable, variable selection, dimension reduction, applications.

Authors Publication date
2016
Publication type
Thesis
Summary The aim of this thesis was to explore the theme of scoring in the context of its use in the banking world, and more particularly to control credit risk. Indeed, the diversification and globalization of banking activities in the second half of the 20th century led to the introduction of a certain number of regulations, in order to ensure that banking institutions have the necessary capital to cover the risk they take. This regulation thus requires the modeling of certain risk indicators, including the probability of default, which is, for a particular loan, the probability that the client will not be able to repay the amount he owes. The modeling of this indicator involves the definition of a variable of interest called the risk criterion, denoting "good payers" and "bad payers". Translated into a more formal statistical framework, this means that we seek to model a variable with values in {0,1} by a set of explanatory variables. In practice, this problem is treated as a scoring issue. Scoring consists in the definition of functions, called score functions, which transfer the information contained in the set of explanatory variables into a real score. The objective of such a function will be to give the same ordering on the individuals as the a posteriori probability of the model, so that the individuals with a high probability of being "good" have a high score, and conversely that the individuals with a high probability of being "bad" (and thus a high risk for the bank) have a low score. Performance criteria such as the ROC curve and the AUC have been defined, allowing to quantify how relevant the ordering produced by the score function is. The reference method for obtaining score functions is logistic regression, which we present here. A major problem in credit risk scoring is the selection of variables. Indeed, banks have large databases containing all the information they have on their customers, both socio-demographic and behavioral, and not all of them can explain the risk criterion. In order to address this issue, we have chosen to consider the Lasso technique, based on the application of a constraint on the coefficients, so as to set the values of the least significant coefficients at zero. We considered this method in the context of linear and logistic regressions, as well as an extension called Group Lasso, allowing to consider explanatory variables by groups. We then considered the case where the response variable is no longer binary, but polytomous, i.e. with several possible response levels. The first step was to present a definition of scoring equivalent to the one presented previously in the binary case. We then presented different regression methods adapted to this new case of study: a generalization of the binary logistic regression, semi-parametric methods, as well as an application of the Lasso principle to polytomous logistic regression. Finally, the last chapter is devoted to the application of some of the methods mentioned in the manuscript on real data sets, allowing to confront them with the real needs of the company.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr