MOUGEOT Mathilde

< Back to ILB Patrimony
Topics of productions
Affiliations
  • 2019 - 2021
    Ecole normale supérieure de Paris-Saclay
  • 2012 - 2019
    Laboratoire de probabilités et modèles aléatoires
  • 2015 - 2019
    Centre de mathématiques et de leurs applications
  • 2019 - 2020
    Centre Borelli
  • 2019 - 2020
    Centre national de la recherche scientifique
  • 2018 - 2019
    Laboratoire de Probabilités, Statistique et Modélisation
  • 2015 - 2017
    Université Paris Diderot
  • 2015 - 2016
    Laboratoire polymères et matériaux avancés
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2013
  • Unsupervised Multi-source Domain Adaptation for Regression.

    Guillaume RICHARD, Antoine de MATHELIN, Georges HEBRAIL, Mathilde MOUGEOT, Nicolas VAYATIS
    Lecture Notes in Computer Science | 2021
    No summary available.
  • Discrepancy-Based Active Learning for Domain Adaptation.

    Antoine DE MATHELIN, Mathilde MOUGEOT, Nicolas VAYATIS
    2021
    The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of domain shift in the case of Lipschitz labeling function. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. Practical algorithms are inferred from the theoretical bounds, one is based on greedy optimization and the other is a K-medoids algorithm. We also provide improved versions of the algorithms to address the case of large data sets. These algorithms are competitive against other state-of-the-art active learning techniques in the context of domain adaptation as shown in our numerical experiments, in particular on large data sets of around one hundred thousand images.
  • Using Machine-Learning Methods to Improve Surface Wind Speed from the Outputs of a Numerical Weather Prediction Model.

    Naveen GOUTHAM, Bastien ALONZO, Aurore DUPRE, Riwal PLOUGONVEN, Rebeca DOCTORS, Lishan LIAO, Mathilde MOUGEOT, Aurelie FISCHER, Philippe DROBINSKI
    Boundary-Layer Meteorology | 2021
    No summary available.
  • KFC: A clusterwise supervised learning procedure based on the aggregation of distances.

    Sothea HAS, Aurelie FISCHER, Mathilde MOUGEOT
    Journal of Statistical Computation and Simulation | 2021
    No summary available.
  • Unsupervised Multi-Source Domain Adaptation for Regression.

    Guillaume RICHARD, Antoine DE MATHELIN, Georges HEBRAIL, Mathilde MOUGEOT, Nicolas VAYATIS
    2020
    We consider the problem of unsupervised domain adaptation from multiple sources in a regression setting. We propose in this work an original method to take benefit of different sources using a weighted combination of the sources. For this purpose, we define a new measure of similarity between probabilities for domain adaptation which we call hypothesis-discrepancy. We then prove a new bound for unsupervised domain adaptation combining multiple sources. We derive from this bound a novel adversarial domain adaptation algorithm adjusting weights given to each source, ensuring that sources related to the target receive higher weights. We finally evaluate our method on different public datasets and compare it to other domain adaptation baselines to demonstrate the improvement for regression tasks.
  • Adversarial Weighting for Domain Adaptation in Regression.

    Antoine DE MATHELIN, Guillaume RICHARD, Mathilde MOUGEOT, Nicolas VAYATIS
    2020
    We present a novel instance based approach to handle regression tasks in the context of supervised domain adaptation. The approach developed in this paper relies on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimization objective for domain adaptation which relies on a discrepancy distance characterizing the difference between domains according to a specific task and a class of hypotheses. To solve this problem, we develop an adversarial network algorithm which learns both the source weighting scheme and the task in one feed-forward gradient descent. We provide numerical evidence of the relevance of the method on public datasets for domain adaptation through reproducible experiments accessible via an online demo interface.
  • Classification of events from ground sensors - Application to the monitoring of fragile people.

    Ludovic MINVIELLE, Nicolas VAYATIS, Mathilde MOUGEOT, Bernadette DORIZZI, Amaury HABRARD, Francois CHARPILLET, Miguel COLOM, Amaury HABRARD, Francois CHARPILLET
    2020
    This thesis deals with the detection of events in signals from ground sensors for the monitoring of elderly people. In view of the practical issues, it seems indeed that pressure sensors located on the ground are good candidates for monitoring activities, especially fall detection. As the signals to be processed are complex, sophisticated models should be used. Thus, in order to design a fall detector, we propose an approach based on random forests, while addressing hardware constraints with a variable selection procedure. The performance is improved using a data augmentation method as well as temporal aggregation of the model responses. We then address the issue of confronting our model to the real world, with transfer learning methods that act on the basic model of random forests, i.e. decision trees. These methods are adaptations of previous work and are designed to address the problem of class imbalance, where falling is a rare event. We test them on several datasets, showing encouraging results for the future, and a Python implementation is made available. Finally, motivated by the issue of tracking elderly people while processing a one-dimensional signal for a large area, we propose to distinguish elderly people from younger individuals using a convolutional neural network model and dictionary learning. Since the signals to be processed are mostly steps, the first brick of the model is trained to focus on the steps in the signals, and the second part of the model is trained separately on the final task. This new approach to gait classification allows to efficiently recognize signals from elderly people.
  • Hybrid Modelling for Lifetime Prediction.

    Fikri HAFID, Maxime GUEGUIN, Vincent LAURENT, Mathilde MOUGEOT, Nicolas VAYATIS, Christine YANG, Jean michel GHIDAGLIA
    Lecture Notes in Mechanical Engineering | 2020
    No summary available.
  • Predicting Job Power Consumption Based on RJMS Submission Data in HPC Systems.

    Theo SAILLANT, Jean christophe WEILL, Mathilde MOUGEOT
    High Performance Computing | 2020
    No summary available.
  • Budget learning based on equivalent trees and genetic algorithm : application to fall detection algorithm embedding.

    Sergio PEIGNIER, Mounir ATIQ, Mathilde MOUGEOT
    2020
    Budget learning is a research field of growing interest that aims at including real world resource constraints into the design of machine learning models, mainly to reduce real environment prediction time. One common way of doing it is by modifying a pre-trained machine learning model, to fit the prediction time constraints while keeping as best as possible the model's prediction quality. However, in this case, the performance of these kinds of methods depends on the pre-trained model structure. To overcome this dependence, we propose to tackle the budgeted optimization problem, by using equivalent models with different structures and therefore different computation costs. The contribution of this work is to propose a genetic algorithm to decrease prediction time of random forest classifiers, by using equivalent decision trees. The first step of our method consists in building, from a pre-trained random forest, an initial population of random forests, that share the same decision function but have different structures. Then a genome reduction operation, is iteratively applied on the individuals via pruning based mutations. Our experiments show an important impact of using equivalent decision trees on reachable random forest solutions with a budgeted prediction time. Results obtained on a synthetic data made of gaussian-shaped clusters and on a real industrial fall detection dataset, advocate for the use of equivalent random forest models in budget learning.
  • Leveraging Digital Disruptions for a Climate-Safe and Equitable World: The Dˆ2S Agenda: [Commentary].

    Amy LUERS, Jennifer GARARD, Asun lera ST. CLAIR, Owen GAFFNEY, Tom HASSENBOEHLER, Lyse LANGLOIS, Mathilde MOUGEOT, Sasha LUCCIONI
    IEEE Technology and Society Magazine | 2020
    No summary available.
  • Quantized Variational Inference.Optimal Quantization for variational inference.

    Amir DIB, Mathilde MOUGEOT
    2020
    We present Quantized Variational Inference, a new algorithm for Evidence Lower Bound minimization. We show how Optimal Voronoi Tesselation produces variance free gradients for Evidence Lower Bound (ELBO) optimization at the cost of introducing asymptotically decaying bias. Subsequently, we propose a Richardson extrapolation type method to improve the asymptotic bound. We show that using the Quantized Variational Inference framework leads to fast convergence for both score function and the reparametrized gradient estimator at a comparable computational cost. Finally, we propose several experiments to assess the performance of our method and its limitations.
  • Sustainability in the Digital Age [Special Issue Introduction].

    Amy LUERS, Lyse LANGLOIS, Mathilde MOUGEOT, Sana KHARAGHANI, Alexandra LUCCIONI
    IEEE Technology and Society Magazine | 2020
    No summary available.
  • Analysis of big data in the field of transportation.

    Lena CAREL, Pierre ALQUIER, Mathilde MOUGEOT, Pierre ALQUIER, Mathilde MOUGEOT, Latifa OUKHELLOU, Yohann de CASTRO, Latifa OUKHELLOU
    2019
    The objective of this thesis is to propose new methodologies to be applied to public transportation data. Indeed, we are surrounded more and more by sensors and computers generating huge amounts of data. In the public transport domain, contactless cards generate data every time we use them, whether for loading or for our trips. In this thesis, we use this data for two distinct purposes. First, we wanted to be able to detect groups of passengers with similar temporal patterns. To do this, we first used non-negative matrix factorization as a pre-processing tool for classification. Then we introduced the NMF-EM algorithm allowing dimension reduction and classification simultaneously for a mixture model of multinomial distributions. In a second step, we applied regression methods to these data in order to be able to provide a range of these probable validations. Similarly, we applied this methodology to the detection of anomalies on the network.
  • Aggregation using input–output trade-off.

    Aurelie FISCHER, Mathilde MOUGEOT
    Journal of Statistical Planning and Inference | 2019
    In this paper, we introduce a new learning strategy based on a seminal idea of Mojirsheibani (1999, 2000, 2002a, 2002b), who proposed a smart method for combining several classifiers, relying on a consensus notion. In many aggregation methods, the prediction for a new observation x is computed by building a linear or convex combination over a collection of basic estimators r1(x),. . , rm(x) previously calibrated using a training data set. Mojirsheibani proposes to compute the prediction associated to a new observation by combining selected outputs of the training examples. The output of a training example is selected if some kind of consensus is observed: the predictions computed for the training example with the different machines have to be " similar " to the prediction for the new observation. This approach has been recently extended to the context of regression in Biau et al. (2016). In the original scheme, the agreement condition is actually required to hold for all individual estimators, which appears inadequate if there is one bad initial estimator. In practice, a few disagreements are allowed . for establishing the theoretical results, the proportion of estimators satisfying the condition is required to tend to 1. In this paper, we propose an alternative procedure, mixing the previous consensus ideas on the predictions with the Euclidean distance computed between entries. This may be seen as an alternative approach allowing to reduce the effect of a possibly bad estimator in the initial list, using a constraint on the inputs. We prove the consistency of our strategy in classification and in regression. We also provide some numerical experiments on simulated and real data to illustrate the benefits of this new aggregation method. On the whole, our practical study shows that our method may perform much better than the original combination technique, and, in particular, exhibit far less variance. We also show on simulated examples that this procedure mixing inputs and outputs is still robust to high dimensional inputs.
  • Transfer Learning on Decision Tree with Class Imbalance.

    Ludovic MINVIELLE, Mounir ATIQ, Sergio PEIGNIER, Mathilde MOUGEOT
    31st IEEE International Conference on Tools with Artificial Intelligence | 2019
    No summary available.
  • Transfer Learning on Decision Tree with Class Imbalance.

    Ludovic MINVIELLE, Mounir ATIQ, Sergio PEIGNIER, Mathilde MOUGEOT
    2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) | 2019
    No summary available.
  • A clusterwise supervised learning procedure based on aggregation of distances.

    Sothea HAS, Aurelie FISHER, Mathilde MOUGEOT
    2019
    Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, and when different underlying predictive models exist, fitting a model is a more challenging task. We propose, in this paper, a procedure in three steps to automatically solve this problem. The KFC procedure aggregates different models adaptively on data. The first step of the procedure aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. It provides several partitions, given the assumptions on the distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
  • Representations for anomaly detection: Application to aircraft engine vibration data.

    Mina ABDEL SAYED, Gilles FAY, Mathilde MOUGEOT, Nicolas VAYATIS, Mohamed EL BADAOUI, Jerome LACAILLE, Younes BENNANI, Nadine MARTIN
    2018
    Vibration measurements are one of the most relevant data to detect engine anomalies. Vibrations are acquired on a test bench during acceleration and deceleration to ensure engine reliability at the end of the production line. These temporal data are converted into spectrograms to allow the experts to perform a visual analysis of these data and to detect the various atypical signatures. The vibratory sources correspond to lines on the spectrograms. In this thesis, we have implemented an automatic decision support tool to analyze the spectrograms and detect any type of atypical signatures, these signatures do not necessarily come from an engine damage. First, we built a digital database of annotated spectrograms. It is important to note that unusual signatures are variable in shape, intensity and position and are found in a small amount of data. Therefore, to detect these signatures, we characterize the normal behaviors of the spectrograms, analogous to novelty detection methods, by representing the patches of the spectrograms on dictionaries such as curvelets and Non-negative matrix factorization (NMF), as well as by estimating the distribution of each point of the spectrogram from normal data depending or not on their neighborhood. The detection of atypical points is performed by comparing the test data to the normality model estimated on normal training data. The detection of atypical points allows the detection of unusual signatures composed by these points.
  • Learning structures in extreme values in high dimension.

    Mael CHIAPINO, Francois ROUEFF, Anne SABOURIN, Florence d ALCHE BUC, Maud THOMAS, Jessica TRESSOU, Mathilde MOUGEOT, Patrice BERTAIL
    2018
    We present and study methods for unsupervised learning of multivariate extreme phenomena in high dimension. In the case where each of the marginal distributions of a random vector is heavy-tailed, the study of its behavior in extreme regions (i.e. far from the origin) can no longer be done via the usual methods which assume a finite mean and variance. Extreme value theory then offers a framework adapted to this study, by giving in particular a theoretical basis to the reduction of dimension through the angular measurement. The thesis is structured around two main steps: - Reducing the dimension of the problem by finding a summary of the dependence structure in the extreme regions. In particular, this step aims at finding the subgroups of components that are likely to exceed a high threshold simultaneously. - Modeling the angular measurement by a mixing density that follows a predetermined dependency structure. These two steps allow the development of unsupervised classification methods through the construction of a similarity matrix for the extreme points.
  • From Numerical Weather Prediction Outputs to Accurate Local Surface Wind Speed: Statistical Modeling and Forecasts.

    Bastien ALONZO, Riwal PLOUGONVEN, Mathilde MOUGEOT, Aurelie FISCHER, Aurore DUPRE, Philippe DROBINSKI
    Renewable Energy: Forecasting and Risk Management | 2018
    No summary available.
  • Aggregation using input-output trade-off.

    Aurelie FISCHER, Mathilde MOUGEOT
    2018
    In this paper, we introduce a new learning strategy based on a seminal idea of Mojirsheibani (1999, 2000, 2002a, 2002b), who proposed a smart method for combining several classifiers, relying on a consensus notion. In many aggregation methods, the prediction for a new observation x is computed by building a linear or convex combination over a collection of basic estimators r1(x),. . , rm(x) previously calibrated using a training data set. Mojirsheibani proposes to compute the prediction associated to a new observation by combining selected outputs of the training examples. The output of a training example is selected if some kind of consensus is observed: the predictions computed for the training example with the different machines have to be " similar " to the prediction for the new observation. This approach has been recently extended to the context of regression in Biau et al. (2016). In the original scheme, the agreement condition is actually required to hold for all individual estimators, which appears inadequate if there is one bad initial estimator. In practice, a few disagreements are allowed . for establishing the theoretical results, the proportion of estimators satisfying the condition is required to tend to 1. In this paper, we propose an alternative procedure, mixing the previous consensus ideas on the predictions with the Euclidean distance computed between entries. This may be seen as an alternative approach allowing to reduce the effect of a possibly bad estimator in the initial list, using a constraint on the inputs. We prove the consistency of our strategy in classification and in regression. We also provide some numerical experiments on simulated and real data to illustrate the benefits of this new aggregation method. On the whole, our practical study shows that our method may perform much better than the original combination technique, and, in particular, exhibit far less variance. We also show on simulated examples that this procedure mixing inputs and outputs is still robust to high dimensional inputs.
  • Homogeneous Climate Regions Using Learning Algorithms.

    Mathilde MOUGEOT, Dominique PICARD, Vincent LEFIEUX, Miranda MARCHAND
    Renewable Energy: Forecasting and Risk Management | 2018
    No summary available.
  • Fall detection using smart floor sensor and supervised learning.

    Ludovic MINVIELLE, Mounir ATIQ, Renan SERRA, Mathilde MOUGEOT, Nicolas VAYATIS
    2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) | 2017
    No summary available.
  • Statistical learning for wind power: A modeling and stability study towards forecasting.

    Aurelie FISCHER, Lucie MONTUELLE, Mathilde MOUGEOT, Dominique PICARD
    Wind Energy | 2017
    We focus on wind power modeling using machine learning techniques. We show on real data provided by the wind energy company Maïa Eolis, that parametric models, even following closely the physical equation relating wind production to wind speed are outperformed by intelligent learning algorithms. In particular, the CART-Bagging algorithm gives very stable and promising results. Besides, as a step towards forecast, we quantify the impact of using deteriorated wind measures on the performances. We show also on this application that the default methodology to select a subset of predictors provided in the standard random forest package can be refined, especially when there exists among the predictors one variable which has a major impact.
  • Assistance for home care of the elderly with sensor solutions.

    Mathilde MOUGEOT, Julie OGER, Stephane BESSEAU
    Journées d'Etude sur la TéléSANté, 6ème edition | 2017
    The adaptation of society to ageing with significant growth of the population over 65 years and with their desire to age at home is a true revolution of society, and should be anticipated. The recent development of new technologies has enabled the emergence of new connected objects. Our joint work with the society PREDICAL and the laboratory LPMA consists to make the IOT collected data talk and then address this health issue by a mathematical prism. For 18 months, the houses of 12 senior persons living alone, have been equipped with motion, accelerometer, temperature and luminescence sensors. We developed machine learning algorithms to analyse the collected event data in order to provide daily indicators to follow the activity, the social link, the feeding and the sleep quality. Statistical methods have been applied to monitor these indicators over days and to trigger an alarm if strong deviations compared to former behaviours have been diagnosed. Functional data analysis has been also introduced to model the daily activity living and to quantify a potential modification of autonomy. We observed for all studied indicators, strong regularities emerge from the event data. This first conclusion shows that, it is possible to "learn" the habits of each senior and then to quantify any deviation of behaviour. It appears also that each senior has a unique profile of activity. In addition, during the study, our algorithms were also able to quantify the activity recovery of a senior after a return of hospitalization. This information appears to be extremely useful in complement to medical diagnosis. In conclusion, our results obtained in those real environments confirm the strong potential of such approach being able to create consistent indicators measuring and monitoring the degree of autonomy of a senior. These indicators provide, in real time, similar information to the AGIR grid used to quantify the degree of autonomy of seniors by the French Health Institution. In our following works, we plan to analyse other kind of sensors and to enlarge the longitudinal studies to 50 seniors.
  • Dictionary Comparison for Anomaly Detection on Aircraft Engine Spectrograms.

    Mina ABDEL SAYED, Daniel DUCLOS, Gilles FAY, Jerome LACAILLE, Mathilde MOUGEOT
    Lecture Notes in Computer Science | 2016
    No summary available.
  • NMF-based decomposition for anomaly detection applied to vibration analysis.

    Mina ABDEL SAYED, Daniel DUCLOS, Gilles FAY, Jerome LACAILLE, Mathilde MOUGEOT
    International Journal of Condition Monitoring | 2016
    In this paper, vibration analysis of civil aircraft engines in a test-bench to perform anomaly detection is considered. High bandwidth vibration measurements contain essential mechanical information regarding the condition of the engine and the localisation of damage, if present. In this case, vibration data are represented by spectrograms in the frequency domain, which are high-dimensional data that include both instrumental noise and non-discriminating information. Automatic algorithms for detecting specific damage are employed in order to provide a health status. however, these are hard to train. Experts from Snecma consistently perform visual analysis to confirm the health status of the engine. To develop an automatic extraction of relevant information in this high-dimensional context, the authors propose a novel representation of spectrograms based on a dimension reduction under the constraints of positivity, known as non-negative matrix factorisation (NMF). This method is consistent with the physics. In turn, the detection is based on distances in the reduced space. The algorithm is trained and tested with real engine vibration data, among which one engine has a signature representative of a damaged bearing. The method gives some encouraging results.
  • Forecasting Intra Day Load Curves Using Sparse Functional Regression.

    Mathilde MOUGEOT, Dominique PICARD, Vincent LEFIEUX, Laurence MAILLARD TEYSSIER
    Lecture Notes in Statistics | 2015
    In this paper we provide a prediction method, the prediction box, based on a sparse learning process elaborated on very high dimensional information, which will be able to include new – potentially high dimensional – influential variables and adapt to different contexts of prediction. We elaborate and test this method in the setting of predicting the national French intra day load curve, over a period of time of 7 years on a large data basis including daily French electrical consumptions as well as many meteorological inputs, calendar statements and functional dictionaries. The prediction box incorporates a huge contextual information coming from the past, organizes it in a manageable way through the construction of a smart encyclopedia of scenarios, provides experts elaborating strategies of prediction by comparing the day at hand to referring scenarios extracted from the encyclopedia, and then harmonizes the different experts. More precisely, the prediction box is built using successive learning procedures: elaboration of a data base of historical scenarios organized on a high dimensional and functional learning of the intra day load curves, construction of expert forecasters using a retrieval information task among the scenarios, final aggregation of the experts. The results on the national French intra day load curves strongly show the benefits of using a sparse functional model to forecast the electricity consumption. They also appear to meet quite well with the business knowledge of consumption forecasters and even shed new lights on the domain.
  • Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis.

    Antoine DEMATTEO, Stephan CLEMENCON, Nicolas VAYATIS, Mathilde MOUGEOT
    2013
    In the liquefied natural gas (LNG) shipping industry, the phenomenon of sloshing can lead to the occurrence of very high pressures in the tanks of the vessel. The issue of modelling or estimating the probability of the simultaneous occurrence of such extremal pressures is now crucial from the risk assessment point of view. In this paper, heavy-tail modelling, widely used as a conservative approach to risk assessment and corresponding to a worst-case risk analysis, is applied to the study of sloshing. Multivariate heavy-tailed distributions are considered, with Sloshing pressures investigated by means of small-scale replica tanks instrumented with d >1 sensors. When attempting to fit such nonparametric statistical models, one naturally faces computational issues inherent in the phenomenon of dimensionality. The primary purpose of this article is to overcome this barrier by introducing a novel methodology. For d-dimensional heavy-tailed distributions, the structure of extremal dependence is entirely characterised by the angular measure, a positive measure on the intersection of a sphere with the positive orthant in Rd. As d increases, the mutual extremal dependence between variables becomes difficult to assess. Based on a spectral clustering approach, we show here how a low dimensional approximation to the angular measure may be found. The nonparametric method proposed for model sloshing has been successfully applied to pressure data. The parsimonious representation thus obtained proves to be very convenient for the simulation of multivariate heavy-tailed distributions, allowing for the implementation of Monte-Carlo simulation schemes in estimating the probability of failure. Besides confirming its performance on artificial data, the methodology has been implemented on a real data set specifically collected for risk assessment of sloshing in the LNG shipping industry.
  • Grouping strategies and thresholding for high dimensional linear models rejoinder.

    Mathilde MOUGEOT, Dominique PICARD, Karine TRIBOULEY
    Journal of Statistical Planning and Inference | 2013
    No summary available.
  • Sparse approximation and fit of intraday load curves in a high dimensional framework.

    Mathilde MOUGEOT, Dominique PICARD, Karine TRIBOULEY, Vincent LEFIEUX, Laurence MAILLARD TEYSSIER
    Advances in Adaptive Data Analysis | 2013
    No summary available.
  • Grouping strategies and thresholding for high dimensional linear models.

    Mathilde MOUGEOT, Dominique PICARD, Karine TRIBOULEY
    Journal of Statistical Planning and Inference | 2013
    The estimation problem in a high regression model with structured sparsity is investigated. An algorithm using a two steps block thresholding procedure called GR-LOL is provided. Convergence rates are produced : they depend on simple coherence-type indices of the Gram matrix -easily checkable on the data- as well as sparsity assumptions of the model parameters measured by a combination of l1 within-blocks with lq,q < 1 between-blocks norms. The simplicity of the coherence indicator suggests ways to optimize the rates of convergence when the group structure is not naturally given by the problem and is unknown. In such a case, an auto-driven procedure is provided to determine the regressors groups (number and contents). An intensive practical study compares our grouping methods with the standard LOL algorithm. We prove that the grouping rarely deteriorates the results but can improve them very significantly. GR-LOL is also compared with group-Lasso procedures and exhibits a very encouraging behavior. The results are quite impressive, especially when GR-LOL algorithm is combined with a grouping pre-processing.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr