Some variable selection issues around the Lasso estimator.

Authors
Publication date
2009
Publication type
Thesis
Summary The general problem studied in this thesis is that of linear regression in high dimension. We are particularly interested in estimation methods that capture the sparsity of the target parameter, even when the dimension is greater than the number of observations. A popular method for estimating the unknown parameter of the regression in this context is the least squares estimator penalized by the S\ell_1S norm of the coefficients, known as the lasso. The contributions of this thesis focus on the study of variants of the lasso taking into account either additional information on the input variables or semi-supervised modes of data acquisition. More precisely, the issues addressed in this work are: i) the estimation of the unknown parameter when the space of explanatory variables has a well determined structure (presence of correlations, order structure on the variables or groupings between variables). ii) the construction of estimators adapted to the transductive framework, for which the new unlabeled observations are taken into account. These adaptations are partly deduced by a modification of the penalty in the Definition of the lasso estimator. The introduced procedures are essentially analyzed from a non-asymptotic point of view. In particular, we prove that the estimators verify oracle sparsity inequalities. Consistency results in variable selection are also established. The practical performances of the studied methods are also illustrated through simulation results.
Topics of the publication
  • ...
  • No themes identified
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr