DUPUY Christophe

< Back to ILB Patrimony

Topics of productions

Determinantal point processes
Latent variable models
Topic models
Topic Modelling
Latent Dirichlet allocation
Online Learning
Unsupervised learning
Gibbs Sampling
Latent Variables Models
Online learning
Latent Dirichlet Allocation
...

Affiliations

2016 - 2017

Communauté d'universités et établissements Université de Recherche Paris Sciences et Lettres
2014 - 2017

Technicolor r&d france snc
2016 - 2017

Ecole normale supérieure Paris
2016 - 2017

Sciences mathematiques de paris centre
2016 - 2017

Département d'Informatique de l'Ecole Normale Supérieure
2014 - 2017

Apprentissage statistique et parcimonie

2017
Inference and applications for topic models.
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling.
2016
Learning Determinantal Point Processes in Sublinear Time.
2015
Exploiting crowd sourced reviews to explain movie recommendation.

Inference and applications for topic models.

Christophe DUPUY, Francis BACH, Olivier CAPPE, Francis BACH, Olivier CAPPE, Francois CARON, Michalis TITSIAS, Patrick PEREZ, Christophe DIOT, Alexandre d ASPREMONT, Francois CARON, Michalis TITSIAS

2017

Most of the current recommendation systems are based on ratings (i.e., numbers between 0 and 5) to recommend a content (movie, restaurant.) to a user. The latter often has the possibility to comment on this content in the form of text in addition to rating it. It is difficult to extract information from a raw text while a simple note contains little information about the content and the user. In this thesis, we attempt to suggest a personalized readable text to the user to help him/her quickly form an opinion about a content. More specifically, we first build a thematic model predicting a personalized movie description from textual comments. Our model separates qualitative (i.e., opinionated) themes from descriptive themes by combining textual comments and number scores in an attached probabilistic model. We evaluate our model on an IMDB database and illustrate its performance through theme comparison. We then study parameter inference in large-scale latent variable models, including most theme models. We propose a unified treatment of online inference for latent variable models from non-canonical exponential families and make explicit the links between several previously proposed frequentist and Bayesian methods. We also propose a new inference method for frequentist parameter estimation that adapts MCMC methods to online inference of latent variable models by properly using local Gibbs sampling. For the latent Dirichlet allocation topic model, we provide an extensive set of experiments and comparisons with existing work in which our new approach performs better than previously proposed methods. Finally, we propose a new class of determinantal point processes (DPPs) that can be manipulated for parameter inference and learning in potentially sub-linear time in the number of objects. This class, based on a specific low-rank factorization of the marginal kernel, is particularly suited to a subclass of continuous PPDs and PPDs defined over an exponential number of objects. We apply this class to the modeling of text documents as samples of a PPD on sentences and propose a conditional maximum likelihood formulation for modeling topic proportions, which is made possible without any approximation with our class of PPDs. We present an application to document summarization with a PPD on 2 to the power of 500 objects, where the summaries are composed of readable sentences.

More informations See the publication

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling.

Christophe DUPUY, Francis BACH

Journal of Machine Learning Research | 2017

We study parameter inference in large-scale latent variable models. We first propose an unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods. We then propose a novel inference method for the frequentist estimation of parameters, that adapts MCMC methods to online inference of latent variable models with the proper use of local Gibbs sampling. Then, for latent Dirich-let allocation,we provide an extensive set of experiments and comparisons with existing work, where our new approach outperforms all previously proposed methods. In particular, using Gibbs sampling for latent variable inference is superior to variational inference in terms of test log-likelihoods. Moreover, Bayesian inference through variational methods perform poorly, sometimes leading to worse fits with latent variables of higher dimensionality.

More informations

Learning Determinantal Point Processes in Sublinear Time.

Christophe DUPUY, Francis BACH

2016

We propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items. This class, based on a specific low-rank factorization of the marginal kernel, is particularly suited to a subclass of continuous DPPs and DPPs defined on exponentially many items. We apply this new class to modelling text documents as sampling a DPP of sentences, and propose a conditional maximum likelihood formulation to model topic proportions, which is made possible with no approximation for our class of DPPs. We present an application to document summarization with a DPP on $2^{500}$ items.

More informations

Exploiting crowd sourced reviews to explain movie recommendation.

Sara EL AOUAD, Christophe DUPUY, Renata TEIXEIRA, Christophe DIOT, Francis BACH

2nd Workshop on Recommendation Systems for TELEVISION and ONLINE VIDEO | 2015

Streaming services such as Netflix, M-Go, and Hulu use advanced recommender systems to help their customers identify relevant content quickly and easily. These recommenders display the list of recommended movies organized in sublists labeled with the genre or some more specific labels. Unfortunately , existing methods to extract these labeled sublists require human annotators to manually label movies, which is time-consuming and biased by the views of annotators. In this paper, we design a method that relies on crowd sourced reviews to automatically identify groups of similar movies and label these groups. Our method takes the content of movie reviews available online as input for an algorithm based on Latent Dirichlet Allocation (LDA) that identifies groups of similar movies. We separate the set of similar movies that share the same combination of genre in sublists and personalize the movies to show in each sublist using matrix factorization. The results of a side-by-side comparison of our method against Technicolor's M-Go VoD service are encouraging.

More informations

Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr