ROUVIERE Laurent

< Back to ILB Patrimony
Affiliations
  • 2014 - 2019
    Institut de recherche mathématique de Rennes
  • 2014 - 2015
    Centre de recherche en économie et statistique de l'Ensae et l'Ensai
  • 2014 - 2015
    Centre de recherche en économie et statistique
  • 2004 - 2005
    Universite montpellier 2
  • 2004 - 2005
    Institut Montpelliérain Alexander Grothendieck
  • 2021
  • 2019
  • 2018
  • 2016
  • 2015
  • 2005
  • Statistical analysis of a hierarchical clustering algorithm with outliers.

    Audrey POTERIE, Nicolas KLUTCHNIKOFF, Laurent ROUVIERE
    2021
    No summary available.
  • “Should I stay or should I go now?” Recovery time effect on walking capacity in symptomatic peripheral artery disease.

    Pierre yves DE MULLENHEIM, Laurent ROUVIERE, Mathieu EMILY, Segolene CHAUDRU, Adrien KALADJI, Guillaume MAHE, Alexis LE FAUCHEUR
    Journal of Applied Physiology | 2021
    OBJECTIVE: To investigate the effect of recovery time on walking capacity (WC) throughout repeated maximal walking bouts in symptomatic lower extremity peripheral artery disease (PAD). METHODS: The effect of recovery time on WC (maximal walking time) was determined in 21 PAD participants in three experimental conditions (recovery time from 0.5-9.5 min + a self-selected recovery time [SSRT]): (i) 11 repeated sequences of two treadmill walking bouts (TW-ISO). (ii) a single sequence of seven treadmill walking bouts (TW-CONS). (iii) a single sequence of seven outdoor walking bouts (OW-CONS). Exercise transcutaneous oxygen pressure changes were continuously recorded as an indirect measure of ischemia. An individual recovery time (IRT) beyond which WC did not substantially increase was determined in participants with a logarithmic fit. RESULTS: At the group level, mixed models showed a significant effect (P<.0.001) of recovery time on WC restoration. At the participant level, strong logarithmic relationships were found (median significant R(2)³0.78). The median SSRT corresponded to a median work-to-rest ratio >.1:1 (i.e., a lower recovery time in view of the corresponding previous walking time) and was related to unrecovered ischemia and a WC restoration level of <.80%. A median work-to-rest ratio of ≤1:2 allowed full recovery of ischemia and full restoration of WC. The IRT ratio was between 1:1 and 1:2 and corresponded to the start of recovery from ischemia. CONCLUSION: Recovery time affects the restoration level of WC during repeated maximal walking bouts in symptomatic PAD. Meaningful variations in WC restoration were related to specific levels of work-to-rest ratios.
  • Accelerated gradient boosting.

    G. BIAU, B. CADRE, L. ROUVIERE
    Machine Learning | 2019
    Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem. We combine gradient boosting and Nesterov's accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting). Substantial numerical evidence is provided on both synthetic and real-life data sets to assess the excellent performance of the method in a large variety of prediction problems. It is empirically shown that AGB is much less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting.
  • Classification tree algorithm for grouped variables.

    J. f. DUPUY, A. POTERIE, Jean francois DUPUY, V. MONBET, L. ROUVIERE
    Computational Statistics | 2019
    We consider the problem of predicting a categorical variable based on groups of inputs. Some methods have already been proposed to elaborate classification rules based on groups of variables (e.g. group lasso for logistic regression). However, to our knowledge, no tree-based approach has been proposed to tackle this issue. Here, we propose the Tree Penalized Linear Discriminant Analysis algorithm (TPLDA), a new-tree based approach which constructs a classification rule based on groups of variables. It consists in splitting a node by repeatedly selecting a group and then applying a regularized linear discriminant analysis based on this group. This process is repeated until some stopping criterion is satisfied. A pruning strategy is proposed to select an optimal tree. Compared to the existing multivariate classification tree methods, the proposed method is computationally less demanding and the resulting trees are more easily interpretable. Furthermore, TPLDA automatically provides a measure of importance for each group of variables. This score allows to rank groups of variables with respect to their ability to predict the response and can also be used to perform group variable selection. The good performances of the proposed algorithm and its interest in terms of prediction accuracy, interpretation and group variable selection are loud and compared to alternative reference methods through simulations and applications on real datasets.
  • Regression with R.

    Pierre andre CORNILLON, Eric MATZNER LOBER, Nicolas HENGARTNER, Laurent ROUVIERE
    2019
    No summary available.
  • R for statistics and data science.

    Nicolas JEGOU, Nicolas KLUTCHNIKOFF, Laurent ROUVIERE, Francois HUSSON, Pierre andre CORNILLON, Arnaud GUYADER
    2018
    The 4th cover states: "R software is an essential tool for statistics, data visualization, and data science in both the academic and corporate worlds. This can be explained by its three main qualities: it is free, very complete and constantly growing. Recently, it has been able to adapt itself to enter the era of big data and to allow the collection and processing of heterogeneous and very large data (from the Web, textual data, etc.). This book is divided into two main parts: the first part focuses on how the R software works, while the second part implements some thirty statistical methods through data sheets. These sheets are each based on a concrete example and cover a wide range of techniques for processing data. This book is intended for beginners as well as for regular users of R. It will enable them to quickly produce graphs and simple or elaborate statistical treatments.
  • R for statistics and data science.

    Francois HUSSON, Eric MATZNER LOBER, Arnaud GUYADER, Pierre andre CORNILLON, Julie JOSSE, Laurent ROUVIERE, Nicolas KLUTCHNIKOFF, Benoit THIEURMEL, Nicolas JEGOU, Erwann LE PENNEC
    2018
    No summary available.
  • Scoring for credit risk: polytomous response variable, variable selection, dimension reduction, applications.

    Clement VITAL, Valentin PATILEA, Laurent ROUVIERE
    2016
    The aim of this thesis was to explore the theme of scoring in the context of its use in the banking world, and more particularly to control credit risk. Indeed, the diversification and globalization of banking activities in the second half of the 20th century led to the introduction of a certain number of regulations, in order to ensure that banking institutions have the necessary capital to cover the risk they take. This regulation thus requires the modeling of certain risk indicators, including the probability of default, which is, for a particular loan, the probability that the client will not be able to repay the amount he owes. The modeling of this indicator involves the definition of a variable of interest called the risk criterion, denoting "good payers" and "bad payers". Translated into a more formal statistical framework, this means that we seek to model a variable with values in {0,1} by a set of explanatory variables. In practice, this problem is treated as a scoring issue. Scoring consists in the definition of functions, called score functions, which transfer the information contained in the set of explanatory variables into a real score. The objective of such a function will be to give the same ordering on the individuals as the a posteriori probability of the model, so that the individuals with a high probability of being "good" have a high score, and conversely that the individuals with a high probability of being "bad" (and thus a high risk for the bank) have a low score. Performance criteria such as the ROC curve and the AUC have been defined, allowing to quantify how relevant the ordering produced by the score function is. The reference method for obtaining score functions is logistic regression, which we present here. A major problem in credit risk scoring is the selection of variables. Indeed, banks have large databases containing all the information they have on their customers, both socio-demographic and behavioral, and not all of them can explain the risk criterion. In order to address this issue, we have chosen to consider the Lasso technique, based on the application of a constraint on the coefficients, so as to set the values of the least significant coefficients at zero. We considered this method in the context of linear and logistic regressions, as well as an extension called Group Lasso, allowing to consider explanatory variables by groups. We then considered the case where the response variable is no longer binary, but polytomous, i.e. with several possible response levels. The first step was to present a definition of scoring equivalent to the one presented previously in the binary case. We then presented different regression methods adapted to this new case of study: a generalization of the binary logistic regression, semi-parametric methods, as well as an application of the Lasso principle to polytomous logistic regression. Finally, the last chapter is devoted to the application of some of the methods mentioned in the manuscript on real data sets, allowing to confront them with the real needs of the company.
  • Detection of anomalies and breaks in time series. Applications to the management of electricity production.

    Nedjmeddine ALLAB, Gerard BIAU, Jean patrick BAUDRY, Laurent ROUVIERE, Michel BRONIATOWSKI, Christian DERQUENNE, Eric MATZNER LOBER, Andre MAS, Kengy BARTY
    2016
    Continental is the reference tool used by EDF for long-term electricity management. It is used to develop the operating strategy for the power plants distributed throughout Europe. the tool simulates several variables such as electricity demand, quantity generated and associated costs for each zone and each scenario. the objective of our thesis work is to provide methods for analyzing these production data in order to facilitate their study and synthesis. we collect a set of problems from continental users that we try to solve using anomaly detection techniques and breaks in the time series.
  • On clustering procedures and nonparametric mixture estimation.

    Stephane AURAY, Nicolas KLUTCHNIKOFF, Laurent ROUVIERE
    Electronic journal of statistics | 2015
    This paper deals with nonparametric estimation of conditional den-sities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a prelim-inary clustering algorithm on the additional covariates to guess the mixture component of each observation. Conditional densities of the mixture model are then estimated using kernel density estimates ap-plied separately to each cluster. We investigate the expected L 1 -error of the resulting estimates and derive optimal rates of convergence over classical nonparametric density classes provided the clustering method is accurate. Performances of clustering algorithms are measured by the maximal misclassification error. We obtain upper bounds of this quantity for a single linkage hierarchical clustering algorithm. Lastly, applications of the proposed method to mixture models involving elec-tricity distribution data and simulated data are presented.
  • High dimensional density estimation and curve classification.

    Laurent ROUVIERE
    2005
    The objective of this thesis is to study and extend density estimation and classification techniques in high dimensional spaces. We have chosen to structure our work in three parts. The first part, entitled complements on modified histograms, is composed of two chapters devoted to the study of a family of nonparametric density estimators, the modified histograms, which are known to have good convergence properties in the sense of the information theory criteria. In the first chapter, these estimators are considered as dynamic systems of infinite dimensional state space. The second chapter is devoted to the study of these estimators for dimensions greater than one. The second part of the thesis, entitled combinatorial methods in density estimation, is divided into two chapters. We are interested in the finite distance performance of density estimators selected from a family of candidate estimators, whose cardinal is not necessarily finite. In the first chapter, we study the performance of these methods in the context of the selection of different parameters of modified histograms. In the second chapter, we continue with the selection of kernel estimators whose smoothing parameter adapts locally to the estimation point and to the data. Finally, the third and last part, more applied and independent of the previous ones, presents a new method allowing to classify curves from a decomposition of the observations in wavelet bases.
  • High dimensional density estimation and curve classification.

    Laurent ROUVIERE, Alain BERLINET, Gerard BIAU
    2005
    No summary available.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr