On efficient methods for high-dimensional statistical estimation.

Authors
Publication date
2019
Publication type
Thesis
Summary In this thesis, we examine several aspects of parameter estimation for statistics and machine learning techniques, as well as optimization methods applicable to these problems. The goal of parameter estimation is to find the unknown hidden parameters that govern the data, for example parameters with unknown probability density. Constructing estimators through optimization problems is only part of the problem, finding the optimal value of the parameter is often an optimization problem that must be solved, using various techniques. These optimization problems are often convex for a large class of problems, and we can exploit their structure to obtain fast convergence rates. The first main contribution of the thesis is to develop moment matching techniques for multi-index nonlinear regression problems. We consider the classical nonlinear regression problem, which is infeasible in high dimensions due to the curse of dimensionality. We combine two existing techniques: ADE and SIR to develop the hybrid method without some of the weak aspects of its parents. In the second main contribution, we use a particular type of averaging for stochastic gradient descent. We consider conditional exponential families (like logistic regression), where the objective is to find the unknown value of the parameter. We propose the averaging of the moment parameters, which we call prediction functions. For finite dimensional models, this type of averaging can lead to a negative error, i.e. this approach provides us with an estimator that is better than any linear estimator can ever do. The third main contribution of this thesis concerns Fenchel-Young losses. We consider multi-class linear classifiers with losses of a certain type, so that their double conjugate has a direct product of simplices as support. The corresponding convex-concave point-saddle formulation has a special form with a bilinear matrix term and classical approaches suffer from time-consuming matrix multiplication. We show that for multi-class SVM losses with efficient sampling techniques, our approach has sublinear iteration complexity, i.e., we only have to pay O(n+d+k) three times: for the number of classes k, the number of features d, and the number of samples n, while all existing techniques are more complex.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr