CUTURI Marco

< Back to ILB Patrimony
Affiliations
  • 2018 - 2019
    Communauté d'universités et établissements Université Paris-Saclay
  • 2015 - 2019
    Centre de recherche en économie et statistique de l'Ensae et l'Ensai
  • 2015 - 2019
    Centre de recherche en économie et statistique
  • 2012 - 2019
    Kyoto University
  • 2017 - 2018
    Laboratoire d'Intégration des Systèmes et des Technologies
  • 2016 - 2017
    Ecole nationale de statistique et d'administration économique ParisTech
  • 2004 - 2005
    Ecole nationale supérieure des mines de Paris
  • 2021
  • 2020
  • 2019
  • 2017
  • 2016
  • 2015
  • Optimal transport in high dimension : obtaining regularity and robustness using convexity and projections.

    Francois pierre PATY, Marco CUTURI, Guillaume LECUE, Marco CUTURI, Guillaume LECUE, Jerome MALICK, Francois xavier VIALARD, Giovanni CONFORTI, Laetitia CHAPEL, Umut SIMSEKLI, Jerome MALICK, Francois xavier VIALARD
    2021
    In recent years, optimal transport has gained popularity in machine learning as a way to compare probability measures. Unlike more traditional dissimilarities for probability distributions, such as Kullback-Leibler divergence, optimal transport distances (or Wasserstein distances) allow for the comparison of distributions with disjoint supports by taking into account the geometry of the underlying space. This advantage is however hampered by the fact that these distances are usually computed by solving a linear program, which poses, when the underlying space is high dimensional, well documented statistical challenges commonly referred to as the ``dimensional curse''. Beyond this purely metric aspect, another interest of optimal transport theory is that it provides mathematical tools to study maps that can transform, or transport, one measure into another. Such maps play an increasingly important role in various fields of science (biology, brain imaging) or subfields of machine learning (generative models, domain adaptation), among others. In this thesis, we propose a new estimation framework for computing variants of Wasserstein distances. The goal is to reduce the effects of high dimensionality by taking advantage of the low dimensional structures hidden in the distributions. This can be done by projecting the measures onto a subspace chosen to maximize the Wasserstein distance between their projections. In addition to this new methodology, we show that this framework is more broadly consistent with a link between regularization of Wasserstein distances and robustness.In the following contribution, we start from the same problem of estimating the optimal transport in high dimension, but adopt a different perspective: rather than modifying the cost function, we return to the more fundamental view of Monge and propose to use Brenier's theorem and Caffarelli's regularity theory to define a new procedure for estimating lipschitzian transport maps which are the gradient of a strongly convex function.
  • Equitable and Optimal Transport with Multiple Agents.

    Meyer SCETBON, Laurent MEUNIER, Jamal ATIF, Marco CUTURI
    2021
    We introduce an extension of the Optimal Transport problem when multiple costs are involved. Considering each cost as an agent, we aim to share equally between agents the work of transporting one distribution to another. To do so, we minimize the transportation cost of the agent who works the most. Another point of view is when the goal is to partition equitably goods between agents according to their heterogeneous preferences. Here we aim to maximize the utility of the least advantaged agent. This is a fair division problem. Like Optimal Transport, the problem can be cast as a linear optimization problem. When there is only one agent, we recover the Optimal Transport problem. When two agents are considered, we are able to recover Integral Probability Metrics defined by $\alpha$-H\"older functions, which include the widely-known Dudley metric. To the best of our knowledge, this is the first time a link is given between the Dudley metric and Optimal Transport. We provide an entropic regularization of that problem which leads to an alternative algorithm faster than the standard linear program.
  • Computational optimal transport : with applications to data sciences.

    Gabriel PEYRE, Marco CUTURI
    2020
    No summary available.
  • Leveraging regularization, projections and elliptical distributions in optimal transport.

    Boris MUZELLEC, Marco CUTURI, Quentin MERIGOT, Marco CUTURI, Quentin MERIGOT, Daniel KUHN, Justin SOLOMON, Julien MAIRAL, Daniel KUHN, Justin SOLOMON
    2020
    The ability to manipulate and compare probability measures is essential for many applications in machine learning. Optimal Transport (OT) defines divergences between distributions based on the geometry of the underlying spaces: starting from a cost function defined on the space in which they are supported, OT consists in finding a coupling between the two measures that is optimal with respect to this cost. Because of its geometric anchoring, TO is particularly well adapted to machine learning, and is the subject of a rich mathematical theory. Despite these advantages, the use of TO for data science has long been limited by the mathematical and computational difficulties associated with the underlying optimization problem. To circumvent this problem, one approach is to focus on special cases that admit solutions in closed form, or can be solved efficiently. In particular, the TO between elliptic measures is one of the few cases for which the TO admits a closed form, defining the Bures-Wasserstein (BW) geometry. This thesis focuses on the BW geometry, with the aim of using it as a basic tool for applications in data science. To do so, we consider situations in which BW geometry is sometimes used as a tool for learning representations, extended from projections on subspaces, or regularized by an entropy term. In a first contribution, BW geometry is used to define plunges in the form of elliptic distributions, extending the classical representation in the form of R^d vectors. In a second contribution, we prove the existence of transports which extrapolate restricted applications to low dimensional projections, and show that these "optimal subspace" planes admit closed forms in the case of Gaussian measures. The third contribution of this thesis consists in obtaining closed forms for the entropy transport between non-normalized Gaussian measures, which constitute the first non-trivial expressions for entropy transport. Finally, in a last contribution we use entropy transport to impute missing data in a non-parametric way, while preserving the underlying distributions.
  • Ground Metric Learning on Graphs.

    Matthieu HEITZ, Nicolas BONNEEL, David COEURJOLLY, Marco CUTURI, Gabriel PEYRE
    Journal of Mathematical Imaging and Vision | 2020
    Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations. Their relevance for real-life applications strongly hinges on whether that ground metric parameter is suitably chosen. The challenge of selecting it adaptively and algorithmically from prior knowledge, the so-called ground metric learning (GML) problem, has therefore appeared in various settings. In this paper, we consider the GML problem when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest. This imposes a rich structure for candidate metrics, but also enables far more efficient learning procedures when compared to a direct optimization over the space of all metric matrices. We use this setting to tackle an inverse problem stemming from the observation of a density evolving with time. we seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution. This OT dynamic framework is relevant to model natural phenomena exhibiting displacements of mass, such as the evolution of the color palette induced by the modi- fication of lighting and materials.
  • Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form.

    Hicham JANATI, Boris MUZELLEC, Gabriel PEYRE, Marco CUTURI
    Thirty-fourth Conference on Neural Information Processing Systems | 2020
    Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e.g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry. On the other hand, the numerical resolution of OT problems using entropic regularization has given rise to many applications, but because there are no known closed-form solutions for entropic regularized OT problems, these approaches are mostly algorithmic, not informed by elegant closed forms. In this paper, we propose to fill the void at the intersection between these two schools of thought in OT by proving that the entropy-regularized optimal transport problem between two Gaussian measures admits a closed form. Contrary to the unregularized case, for which the explicit form is given by the Wasserstein-Bures distance, the closed form we obtain is differentiable everywhere, even for Gaussians with degenerate covariance matrices. We obtain this closed form solution by solving the fixed-point equation behind Sinkhorn's algorithm, the default method for computing entropic regularized OT. Remarkably, this approach extends to the generalized unbalanced case-where Gaussian measures are scaled by positive constants. This extension leads to a closed form expression for unbalanced Gaussians as well, and highlights the mass transportation / destruction trade-off seen in unbalanced optimal transport. Moreover, in both settings, we show that the optimal transportation plans are (scaled) Gaussians and provide analytical formulas of their parameters. These formulas constitute the first non-trivial closed forms for entropy-regularized optimal transport, thus providing a ground truth for the analysis of entropic OT and Sinkhorn's algorithm.
  • Debiased Sinkhorn barycenters.

    Hicham JANATI, Marco CUTURI, Alexandre GRAMFORT
    Thirty-seventh International Conference on Machine Learning | 2020
    Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning. It allows to keep the appealing geometrical properties of the unregularized Wasserstein distance while having a significantly lower complexity thanks to Sinkhorn's algorithm. However, entropy brings some inherent smoothing bias, resulting for example in blurred barycenters. This side effect has prompted an increasing temptation in the community to settle for a slower algorithm such as log-domain stabilized Sinkhorn which breaks the parallel structure that can be leveraged on GPUs, or even go back to unregularized OT. Here we show how this bias is tightly linked to the reference measure that defines the entropy regularizer and propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing. Theoretically, we prove that the entropic OT barycenter of univariate Gaussians is a Gaussian and quantify its variance bias. This result is obtained by extending the differentiability and convexity of entropic OT to sub-Gaussian measures with unbounded supports. Empirically, we illustrate the reduced blurring and the computational advantage on various applications.
  • Multi-subject MEG/EEG source imaging with sparse multi-task regression.

    Hicham JANATI, Thomas BAZEILLE, Bertrand THIRION, Marco CUTURI, Alexandre GRAMFORT
    NeuroImage | 2020
    No summary available.
  • Statistics on topological descriptors based on optimal transport.

    Theo LACOMBE, Steve OUDOT, Marco CUTURI, Gabriel PEYRE, Steve OUDOT, Marco CUTURI, Francois xavier VIALARD, Peter BUBENIK, Anthea MONOD, Sayan MUKHERJEE, Francois xavier VIALARD, Peter BUBENIK
    2020
    Topological data analysis (TDA) extracts rich information from structured data (such as graphs or time series) present in modern learning problems. It will represent this information in the form of descriptors of which persistence diagrams are part, which can be described as point measures supported on a half-plane. Although they are not simple vectors, persistence diagrams can nevertheless be compared with each other using partial matching metrics. The similarity between these metrics and the usual metrics of optimal transport - another field of mathematics - has been known for a long time, but a formal link between these two fields remained to be established. The purpose of this thesis is to clarifier this connection so that we can use the many achievements of optimal transport afin developing new statistical tools (both theoretical and practical) for manipulating persistence diagrams. First, we show how partial optimal transport with boundary, a variant of classical optimal transport, provides us with a formalism that contains the usual DTA metrics. We then illustrate the beneficial contributions of this reformulation in different situations: a theoretical study and algorithm for effictively estimating the barycenters of persistence diagrams using regularized transport, characterization of continuous linear representations of the diagrams and their learning via a versatile neural network, as well as a stability result for linear averages of randomly drawn diagrams.
  • Editorial IMA IAI - Information and Inference special issue on optimal transport in data sciences.

    Gabriel PEYRE, Marco CUTURI
    Information and Inference: A Journal of the IMA | 2019
    No summary available.
  • Sample Complexity of Sinkhorn divergences.

    Aude GENEVAY, Lenaic CHIZAT, Francis BACH, Marco CUTURI, Gabriel PEYRE
    AISTATS'19 - 22nd International Conference on Artificial Intelligence and Statistics | 2019
    Optimal transport (OT) and maximum mean discrepancies (MMD) are now routinely used in machine learning to compare probability measures. We focus in this paper on \emph{Sinkhorn divergences} (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$). Although the tradeoff induced by that regularization is now well understood computationally (OT, SDs and MMD require respectively $O(n^3\log n)$, $O(n^2)$ and $n^2$ operations given a sample size $n$), much less is known in terms of their \emph{sample complexity}, namely the gap between these quantities, when evaluated using finite samples \emph{vs.} their respective densities. Indeed, while the sample complexity of OT and MMD stand at two extremes, $1/n^{1/d}$ for OT in dimension $d$ and $1/\sqrt{n}$ for MMD, that for SDs has only been studied empirically. In this paper, we \emph{(i)} derive a bound on the approximation error made with SDs when approximating OT as a function of the regularizer $\varepsilon$, \emph{(ii)} prove that the optimizers of regularized OT are bounded in a Sobolev (RKHS) ball independent of the two measures and \emph{(iii)} provide the first sample complexity bound for SDs, obtained,by reformulating SDs as a maximization problem in a RKHS. We thus obtain a scaling in $1/\sqrt{n}$ (as in MMD), with a constant that depends however on $\varepsilon$, making the bridge between OT and MMD complete.
  • Ground Metric Learning on Graphs.

    Matthieu HEITZ, Nicolas BONNEEL, David COEURJOLLY, Marco CUTURI, Gabriel PEYRE
    2019
    Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations. Their relevance for real-life applications strongly hinges on whether that ground metric parameter is suitably chosen. Selecting it adaptively and algorithmically from prior knowledge, the so-called ground metric learning GML) problem, has therefore appeared in various settings. We consider it in this paper when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest. This imposes a rich structure for candidate metrics, but also enables far more efficient learning procedures when compared to a direct optimization over the space of all metric matrices. We use this setting to tackle an inverse problem stemming from the observation of a density evolving with time: we seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution. This OT dynamic framework is relevant to model natural phenomena exhibiting displacements of mass, such as for instance the evolution of the color palette induced by the modification of lighting and materials.
  • On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests.

    Aaditya RAMDAS, Nicolas TRILLOS, Marco CUTURI
    Entropy | 2017
    Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
  • Mean-Reverting Portfolios.

    Marco CUTURI, Alexandre D ASPREMONT
    Financial Signal Processing and Machine Learning | 2016
    No summary available.
  • Stochastic Optimization for Large-scale Optimal Transport.

    Aude GENEVAY, Marco CUTURI, Gabriel PEYRE, Francis BACH
    NIPS 2016 - Thirtieth Annual Conference on Neural Information Processing System | 2016
    Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation . (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver. (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization . (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.
  • Wasserstein barycentric coordinates.

    Nicolas BONNEEL, Gabriel PEYRE, Marco CUTURI
    ACM Transactions on Graphics | 2016
    This article defines a new way to perform intuitive and geometrically faithful regressions on histogram-valued data. It leverages the theory of optimal transport, and in particular the definition of Wasserstein barycenters, to introduce for the first time the notion of barycentric coordinates for histograms. These coordinates take into account the underlying geometry of the ground space on which the histograms are defined, and are thus particularly meaningful for applications in graphics to shapes, color or material modification. Beside this abstract construction, we propose a fast numerical optimization scheme to solve this backward problem (finding the barycentric coordinates of a given histogram) with a low computational overhead with respect to the forward problem (computing the barycenter). This scheme relies on a backward algorithmic differentiation of the Sinkhorn algorithm which is used to optimize the entropic regularization of Wasserstein barycenters. We showcase an illustrative set of applications of these Wasserstein coordinates to various problems in computer graphics: shape approximation, BRDF acquisition and color editing.
  • Fast Optimal Transport Averaging of Neuroimaging Data.

    Alexandre GRAMFORT, Gabriel PEYRE, Marco CUTURI
    Information Processing in Medical Imaging (IPMI) | 2015
    Knowing how the Human brain is anatomically and functionally organized at the level of a group of healthy individuals or patients is the primary goal of neuroimaging research. Yet computing an average of brain imaging data defined over a voxel grid or a triangulation remains a challenge. Data are large, the geometry of the brain is complex and the between subjects variability leads to spatially or temporally non-overlapping effects of interest. To address the problem of variability, data are commonly smoothed before group linear averaging. In this work we build on ideas originally introduced by Kantorovich to propose a new algorithm that can average efficiently non-normalized data defined over arbitrary discrete domains using transportation metrics. We show how Kantorovich means can be linked to Wasserstein barycenters in order to take advantage of an entropic smoothing approach. It leads to a smooth convex optimization problem and an algorithm with strong convergence guarantees. We illustrate the versatility of this tool and its empirical behavior on functional neuroimaging data, functional MRI and magnetoencephalography (MEG) source estimates, defined on voxel grids and triangulations of the folded cortical surface.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr