Searcher - Institut Louis Bachelier

Best Arm Identification in Graphical Bilinear Bandits.

Geovani RIZK, Thomas ALBERT, Igor COLIN, Rida LARAKI, Yann CHEVALEYRE

Proceedings of the 38th International Conference on Machine Learning | 2021

No summary available.

An Approximate Shapley-Folkman Theorem.

Thomas KERDREUX, Igor COLIN, Alexandre D ASPREMONT

2020

The Shapley-Folkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, Aubin and Ekeland [1976] show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depends on unstable quantities, and we relax it in several directions to show that non convexity can have a much milder impact on finite sum minimization problems such as empirical risk minimization and multi-task classification. As a byproduct, we show a new version of Maurey's classical approximate Carath\'eodory lemma where we sample a significant fraction of the coefficients, without replacement, as well as a result on sampling constraints using an approximate Helly theorem, both of independent interest.

More informations

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions.

Igor COLIN, Aurelien BELLET, Joseph SALMON, Stephan CLEMENCON

International Conference on Machine Learning (ICML 2016) | 2016

In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function , for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the nodes of a graph defining the communication topology of the network. This general problem finds applications in ranking, distance metric learning and graph inference, among others. We propose new gossip algorithms based on dual averaging which aims at solving such problems both in synchronous and asynchronous settings. The proposed framework is flexible enough to deal with constrained and regularized variants of the optimization problem. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. We present numerical simulations on Area Under the ROC Curve (AUC) maximization and metric learning problems which illustrate the practical interest of our approach.

More informations

Adapting machine learning methods to U-statistics.

Igor COLIN

2016

With the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach.

More informations

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions.

Igor COLIN, Aurelien BELLET, Joseph SALMON, Stephan CLEMENCON

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions | 2016

In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function , for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the nodes of a graph defining the communication topology of the network. This general problem finds applications in ranking, distance metric learning and graph inference, among others. We propose new gossip algorithms based on dual averaging which aims at solving such problems both in synchronous and asynchronous settings. The proposed framework is flexible enough to deal with constrained and regularized variants of the optimization problem. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. We present numerical simulations on Area Under the ROC Curve (AUC) maximization and metric learning problems which illustrate the practical interest of our approach.

More informations

Adaptation of learning methods to U-statistics.

Igor COLIN, St?phan CL?MEN?ON, Joseph SALMON, Pascal BIANCHI, Peter RICHTARIK, Mikael JOHANSSON, Alexandre d ASPREMONT

2016

The recent explosion of available data volumes has made algorithmic complexity a central element of machine learning methods. Stochastic optimization algorithms as well as distributed and centralized methods have been widely developed during the last ten years. These methods have made it easier to scale up to optimize empirical risks whose formulation is comparable to the associated observations. However, in many statistical learning problems, the accurate estimation of risk is done using U-statistics, functions of data in the form of averages over d-tuples. We first address the problem of sampling for the minimization of the empirical risk. We show that the risk can be replaced by a Monte-Carlo estimator, called U-statistic incomplete, which is a very efficient way to minimize the risk. We show that the risk can be replaced by an incomplete Monte Carlo estimator, named U-statistic, based on only O(n) terms and allowing to keep a learning rate of the same order. We establish bounds on the approximation error of the U-process and the numerical simulations demonstrate the advantage of such a sampling technique. We then turn our attention to centralized estimation, where observations are now distributed over a related network. We develop so-called gossip algorithms, in both synchronous and asynchronous settings, that disseminate the observations while maintaining local estimators of the U-statistic to be estimated. We show the convergence of these algorithms with explicit data dependencies and network topology. Finally, we deal with the centralized optimization of functions of pairs of observations. As for the estimation, our methods are based on the concomitance of the propagation of observations and the local optimization of the risk. Our theoretical analysis underlines that these methods keep a convergence speed of the same order as in the centralized case. The numerical experiments confirm the practical interest of our approach.

More informations See the publication

A Gossip algorithm for decentralized optimization of functions on pairs.

Igor COLIN, Aurelien BELLET, Joseph SALMON, Stephan CLEMENCON

Cap 2016 - Conférence sur L'apprentissage automatique | 2016

No summary available.

More informations

Decentralized Topic Modelling with Latent Dirichlet Allocation.

Igor COLIN, Christophe DUPUY

NIPS 2016 - 30th Conference on Neural Information Processing Systems | 2016

Privacy preserving networks can be modelled as decentralized networks (e.g., sensors , connected objects, smartphones), where communication between nodes of the network is not controlled by a master or central node. For this type of networks, the main issue is to gather/learn global information on the network (e.g., by optimizing a global cost function) while keeping the (sensitive) information at each node. In this work, we focus on text information that agents do not want to share (e.g., , text messages, emails, confidential reports). We use recent advances on decentralized optimization and topic models to infer topics from a graph with limited communication. We propose a method to adapt latent Dirichlet allocation (LDA) model to decentralized optimization and show on synthetic data that we still recover similar parameters and similar performance at each node than with stochastic methods accessing to the whole information in the graph.

More informations

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics.

Stephan CLEMENCON, Igor COLIN, Aurelien BELLET

Journal of Machine Learning Research | 2016

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by U-statistics of degree d ≥ 1, i.e. functionals of the training data with low variance that take the form of averages over k-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample size n, as it requires averaging O(n^d) terms. This makes learning procedures relying on the optimization of such data functionals hardly feasible in practice. It is the major goal of this paper to show that, strikingly, such empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the O(1/√n) learning rate of Empirical Risk Minimization (ERM) procedures. For this purpose, we establish uniform deviation results describing the error made when approximating a U-process by its incomplete version under appropriate complexity assumptions. Extensions to model selection, fast rate situations and various sampling techniques are also considered , as well as an application to stochastic gradient descent for ERM. Finally, numerical examples are displayed in order to provide strong empirical evidence that the approach we promote largely surpasses more naive subsampling techniques.

More informations

Extending Gossip Algorithms to Distributed Estimation of U-statistics.

Igor COLIN, Aurelien BELLET, Joseph SALMON, Stephan CLEMENCON

Annual Conference on Neural Information Processing Systems (NIPS) | 2015

Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of U-statistics, relying on more expensive averaging over pairs of observations, is a less investigated area. Yet, such data functionals are essential to describe global properties of a statistical population, with important examples including Area Under the Curve, empirical variance, Gini mean difference and within-cluster point scatter. This paper proposes new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds of O(1 / t) and O(log t / t) for the synchronous and asynchronous cases respectively, where t is the number of iterations, with explicit data and network dependent terms. Beyond favorable comparisons in terms of rate analysis, numerical experiments provide empirical evidence the proposed algorithms surpasses the previously introduced approach.

More informations

Extending Gossip Algorithms to Distributed Estimation of U -Statistics.

Igor COLIN, Joseph SALMON, Stephan CLEMENCON, Aurelien BELLET

Extending Gossip Algorithms to Distributed Estimation of U -Statistics | 2015

Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of U-statistics, relying on more expensive averaging over pairs of observations, is a less investigated area. Yet, such data functionals are essential to describe global properties of a statistical population, with important examples including Area Under the Curve, empirical variance, Gini mean difference and within-cluster point scatter. This paper proposes new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds of O(1/t) and O(log t/t) for the synchronous and asynchronous cases respectively, where t is the number of iterations, with explicit data and network dependent terms. Beyond favorable comparisons in terms of rate analysis, numerical experiments provide empirical evidence the proposed algorithms surpasses the previously introduced approach.

More informations

COLIN Igor

Best Arm Identification in Graphical Bilinear Bandits.