GOWER Robert

< Back to ILB Patrimony
Affiliations
  • 2020 - 2021
    Télécom ParisTech
  • 2016 - 2019
    Apprentissage statistique et parcimonie
  • 2018 - 2019
    Laboratoire traitement et communication de l'information
  • 2021
  • 2019
  • 2017
  • Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball.

    Othmane SEBBOUH, Robert GOWER, Aaron DEFAZIO
    2021
    We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB, otherwise known as the momentum method) for the general stochastic approximation problem. For SGD, in the convex and smooth setting, we provide the first almost sure asymptotic convergence rates for a weighted average of the iterates. More precisely, we show that the convergence rate of the function values is arbitrarily close to o(1/ √ k), and is exactly o(1/k) in the so-called overparametrized case. We show that these results still hold when using stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime. Using a substantially different analysis, we show that these rates hold for SHB as well, but at the last iterate. This distinction is important because it is the last iterate of SGD and SHB which is used in practice. We also show that the last iterate of SHB converges to a minimizer almost surely. Additionally, we prove that the function values of the deterministic HB converge at a o(1/k) rate, which is faster than the previously known O(1/k). Finally, in the nonconvex setting, we prove similar rates on the lowest gradient norm along the trajectory of SGD.
  • Towards closing the gap between the theory and practice of SVRG.

    Othmane SEBBOUH, Nidham GAZAGNADOU, Samy JELASSI, Francis BACH, Robert GOWER
    Conference on Neural Information Processing Systems | 2019
    Amongst the very first variance reduced stochastic methods for solving the empiri- cal risk minimization problem was the SVRG method [11]. SVRG is an inner-outer loop based method, where in the outer loop a reference full gradient is evaluated, after which m ∈ N steps of an inner loop are executed where the reference gradient is used to build a variance reduced estimate of the current gradient. The simplicity of the SVRG method and its analysis has lead to multiple extensions and variants for even non-convex optimization. Yet there is a significant gap between the param- eter settings that the analysis suggests and what is known to work well in practice. Our first contribution is that we take several steps here towards closing this gap. In particular, the current analysis shows that m should be of the order of the condition number so that the resulting method has a favourable complexity. Yet in practice m = n works well irregardless of the condition number, where n is the number of data points. Furthermore, the current analysis shows that the inner iterates have to be reset using averaging after every outer loop. Yet in practice SVRG works best when the inner iterates are updated continuously and not reset. We provide an anal- ysis of these aforementioned practical settings and show that they achieve the same favourable complexity as the original analysis (with slightly better constants). Our second contribution is to provide a more general analysis than had been previously done by using arbitrary sampling, which allows us to analyse virtually all forms of mini-batching through a single theorem. Since our setup and analysis reflects what is done in practice, we are able to set the parameters such as the mini-batch size and step size using our theory in such a way that produces a more efficient algorithm in practice, as we show in extensive numerical experiments.
  • SGD: General Analysis and Improved Rates.

    Robert GOWER, Nicolas LOIZOU, Xun QIAN, Alibek SAILANBAYEV, Egor SHULGIN, Peter RICHTARIK
    International Conference on Machine Learning | 2019
    We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form minibatches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies , such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime.
  • Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization.

    Robert GOWER, Peter RICHTARIK, Filip HANZELY, Sebastian STICH
    International Conference on Machine Learning | 2019
    We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.
  • RSN: Randomized Subspace Newton.

    Robert GOWER, Dmitry KOVALEV, Felix LIEDER, Peter RICHTARIK
    Conference on Neural Information Processing Systems | 2019
    We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages ran-domized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi [18].
  • Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse.

    Robert GOWER, Peter RICHTARIK
    2017
    We develop the first stochastic incremental method for calculating the Moore-Penrose pseu-doinverse of a real matrix. By leveraging three alternative characterizations of pseudoinverse matrices, we design three methods for calculating the pseudoinverse: two general purpose methods and one specialized to symmetric matrices. The two general purpose methods are proven to converge linearly to the pseudoinverse of any given matrix. For calculating the pseudoinverse of full rank matrices we present two additional specialized methods which enjoy a faster convergence rate than the general purpose methods. We also indicate how to develop randomized methods for calculating approximate range space projections, a much needed tool in inexact Newton type methods or quadratic solvers when linear constraints are present. Finally, we present numerical experiments of our general purpose methods for calculating pseudoinverses and show that our methods greatly outperform the Newton-Schulz method on large dimensional matrices.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr