DUPUY Christophe

< Back to ILB Patrimony
Affiliations
  • 2016 - 2017
    Communauté d'universités et établissements Université de Recherche Paris Sciences et Lettres
  • 2014 - 2017
    Technicolor r&d france snc
  • 2016 - 2017
    Ecole normale supérieure Paris
  • 2016 - 2017
    Sciences mathematiques de paris centre
  • 2016 - 2017
    Département d'Informatique de l'Ecole Normale Supérieure
  • 2014 - 2017
    Apprentissage statistique et parcimonie
  • 2017
  • 2016
  • 2015
  • Inference and applications for topic models.

    Christophe DUPUY, Francis BACH, Olivier CAPPE, Francis BACH, Olivier CAPPE, Francois CARON, Michalis TITSIAS, Patrick PEREZ, Christophe DIOT, Alexandre d ASPREMONT, Francois CARON, Michalis TITSIAS
    2017
    Most of the current recommendation systems are based on ratings (i.e., numbers between 0 and 5) to recommend a content (movie, restaurant.) to a user. The latter often has the possibility to comment on this content in the form of text in addition to rating it. It is difficult to extract information from a raw text while a simple note contains little information about the content and the user. In this thesis, we attempt to suggest a personalized readable text to the user to help him/her quickly form an opinion about a content. More specifically, we first build a thematic model predicting a personalized movie description from textual comments. Our model separates qualitative (i.e., opinionated) themes from descriptive themes by combining textual comments and number scores in an attached probabilistic model. We evaluate our model on an IMDB database and illustrate its performance through theme comparison. We then study parameter inference in large-scale latent variable models, including most theme models. We propose a unified treatment of online inference for latent variable models from non-canonical exponential families and make explicit the links between several previously proposed frequentist and Bayesian methods. We also propose a new inference method for frequentist parameter estimation that adapts MCMC methods to online inference of latent variable models by properly using local Gibbs sampling. For the latent Dirichlet allocation topic model, we provide an extensive set of experiments and comparisons with existing work in which our new approach performs better than previously proposed methods. Finally, we propose a new class of determinantal point processes (DPPs) that can be manipulated for parameter inference and learning in potentially sub-linear time in the number of objects. This class, based on a specific low-rank factorization of the marginal kernel, is particularly suited to a subclass of continuous PPDs and PPDs defined over an exponential number of objects. We apply this class to the modeling of text documents as samples of a PPD on sentences and propose a conditional maximum likelihood formulation for modeling topic proportions, which is made possible without any approximation with our class of PPDs. We present an application to document summarization with a PPD on 2 to the power of 500 objects, where the summaries are composed of readable sentences.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr