Generic acceleration algorithms for optimization methods in statistical learning.

Authors

LIN Hongzhou
HARCHAOUI Zaid
MAIRAL Julien
MALICK Jerome
ASPREMONT Alexandre d
JEGELKA Stefanie
GLINEUR Francois
PEYRE Gabriel

Publication date

2017

Publication type

Thesis

Summary Optimization problems arise naturally during the training of supervised learning models. A typical example is the empirical risk minimization (ERM) problem, which aims at finding an estimator by minimizing the risk on a set of data. The main challenge is to design efficient optimization algorithms that can handle a large amount of data in high dimensional spaces. In this context, classical optimization methods, such as the gradient descent algorithm and its accelerated variant, are computationally expensive because they require going through all the data at each gradient evaluation. This shortcoming motivates the development of the class of incremental algorithms that perform updates with incremental gradients. These algorithms reduce the computational cost per iteration, resulting in a significant improvement in computational time compared to classical methods. A natural question arises: would it be possible to further accelerate these incremental methods? In Chapter 2, we develop a proximal variant of the Finito/MISO algorithm, which is an incremental method initially designed for smooth and strongly convex problems. We introduce a proximal step in the update of the algorithm to take into account the regularization penalty which is potentially non-smooth. In Chapter 3, we introduce a generic acceleration scheme, called Catalyst, which applies to a large class of optimization methods, in the context of convex optimizations. The generic feature of our scheme allows the user to select their preferred method best suited to the problems. We show that by applying Catalyst, we obtain an accelerated convergence rate. More importantly, this rate coincides with the optimal rate of incremental methods at a near logarithmic factor in worst-case analysis. Thus, our approach is not only generic but also nearly optimal from the theoretical point of view. We then show that the acceleration is well represented in practice, especially for ill-conditioned problems.In Chapter 4, we present a second generic approach that applies Quasi-Newton principles to accelerate first-order methods, calledQNing. The scheme applies to the same class of methods as Catalyst. Moreover, it admits a simple interpretation as a combination of the L-BFGS algorithm and the Moreau-Yosida regularization. To our knowledge, QNing is the first Quasi-Newton algorithm compatible with composite objectives and finite sum structure. We conclude this thesis by proposing an extension of the Catalyst algorithm to the non-convex case. This is a collaborative work with Dr. Courtney Paquette and Prof. Dmitriy Drusvyatskiy, from the University of Washington, and my thesis supervisors. The strength of this approach lies in its ability to automatically adapt to the convexity. Indeed, no information on the convexity of the function is necessary before launching the algorithm. When the objective is convex, the proposed approach presents the same convergence rates as the convex Catalyst algorithm, resulting in a speed-up. When the objective is non-convex, the algorithm converges to the stationary points with the best convergence rate for first order methods. Promising experimental results are observed by applying our method to recimonious matrix factorization problems and to the training of neural network models.

See the publication

Topics of the publication

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr