MARIN Jean Michel

< Back to ILB Patrimony
Topics of productions
Affiliations
  • 2012 - 2020
    Institut Montpelliérain Alexander Grothendieck
  • 2017 - 2018
    Université de Montpellier
  • 2017 - 2018
    Biologie computationnelle et quantitative
  • 2015 - 2019
    Centre de biologie pour la gestion des populations
  • 2017 - 2018
    Sélection de modèles en apprentissage statistique
  • 2013 - 2014
    Centre de recherche en économie et statistique de l'Ensae et l'Ensai
  • 2013 - 2014
    Centre de recherche en économie et statistique
  • 2000 - 2001
    Université Toulouse 3 Paul Sabatier
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2014
  • Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest.

    Francois david COLLIN, Ghislain DURIF, Louis RAYNAL, Eric LOMBAERT, Mathieu GAUTIER, Renaud VITALIS, Jean michel MARIN, Arnaud ESTOUP
    Molecular Ecology Resources | 2021
    Simulation-based methods such as Approximate Bayesian Computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. RF allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated datasets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real datasets corresponding to pool-sequencing and individual-sequencing SNP datasets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP datasets to make inferences about complex population genetic histories.
  • Statistical modeling of medical data and theoretical analysis of estimation algorithms.

    Vianney DEBAVELAERE, Stephanie ALLASSONNIERE, Stanley DURRLEMAN, Emmanuel GOBET, Stephanie ALLASSONNIERE, Stanley DURRLEMAN, Christophe ANDRIEU, Jean michel MARIN, Maria VAKALOPOULOU, Christophe ANDRIEU, Jean michel MARIN
    2021
    In the medical field, the use of features extracted from images is more and more widespread. These measures can be real numbers (volume, cognitive score), organ meshes or the image itself. In these last two cases, a Euclidean space cannot describe the space of measures and it is necessary to place oneself on a Riemannian variety. Using this Riemannian framework and mixed effects models, it is then possible to estimate a representative object of the population as well as the inter-individual variability. In the longitudinal case (subjects observed repeatedly over time), these models allow to create an average trajectory representative of the global evolution of the population. In this thesis, we propose to generalize these models in the case of a mixed population. Each sub-population can follow different dynamics over time and their representative trajectory can be the same or differ from one time interval to another. This new model allows for example to model the onset of a disease as a deviation from normal aging.We are also interested in the detection of anomalies (e.g. tumors) in a population. With an object representing a control population, we define an anomaly as what cannot be reconstructed by diffeomorphic deformation of this representative object. Our method has the advantage of requiring neither a large dataset nor annotation by physicians and can be easily applied to any organ.Finally, we focus on different theoretical properties of the estimation algorithms used. In the context of nonlinear mixed effects models, the MCMC-SAEM algorithm is used. We will discuss two theoretical limitations. First, we will lift the geometric ergodicity assumption by replacing it with a sub-geometric ergodicity assumption. Furthermore, we will focus on a method to apply the SAEM algorithm when the joint distribution is not exponentially curved. We will show that this method introduces a bias in the estimate that we will measure. We will also propose a new algorithm to reduce it.
  • Bringing ABC inference to the machine learning realm : AbcRanger, an optimized random forests library for ABC.

    Francois david COLLIN, Arnaud ESTOUP, Jean michel MARIN, Louis RAYNAL
    JOBIM 2020 | 2020
    The AbcRanger library provides methodologies for model choice and parameter estimation based on fast and scalable Random Forests, tuned to handle large and/or high dimensional datasets. The library, initially intended for the population genetics ABC framework DIYABC, has been generalized to any ABC reference table generator. At first, computational issues were encountered with the reference ABC-Random Forest. Those issues have been diagnosed by us as friction between "strict" Machine Learning setup and ABC context, and this incited us to modify the C++ implementation of state-of-the-art random forests, ranger, to tailor it for ABC needs: potentially "deep" decision trees are not stored in memory anymore, but are processed by batches in parallel. We focused on memory and thread scalability, ease of use (minimal hyperparameter set). R and python interfaces are provided.
  • Dynamic Monitoring Measures.

    Sophie MIALLARET, Arnaud GUILLIN, Anne francoise YAO, Vincent SAPIN, Denys POMMERET, Laurence REBOUL, Hacene DJELLOUT, Jean michel MARIN, Sophie DABO NIANG
    2019
    Measurements are an everyday act, they give us a lot of information and allow us to make decisions. The analysis of measurements can allow us to learn more about our environment, but the error of a measurement can have important consequences in certain fields. In a first part, we propose, thanks to the study of blood analysis measurements carried out at the University Hospital of Clermont-Ferrand, a procedure allowing to detect the drifts of the analyzers of medical biology laboratories, based on the measurements of patients' analyses. After a descriptive analysis of the data, the method implemented, using methods for detecting breaks in time series, is tested for simulations of breaks representing shifts, inaccuracies or drifts of analyzers for different biological parameters measured. The method is adapted for two scenarios: when the patients' hospital service is known or not. The study is completed by an analysis of the impact of the measurement uncertainty on the patients' analyses. In a second part we study measurements of volcanic ash shapes made at the Magmas and Volcanoes Laboratory of the University of Clermont Auvergne, in order to determine a link between the collection locations and the particle shapes. After having shown the dependence between these parameters, we propose, thanks to a classification method, a grouping of the particles representing different populations depending on the distance between the collection sites and the crater of the volcano.
  • Consistency of the Adaptive Multiple Importance Sampling.

    Jean michel MARIN, Pierre PUDLO, Mohammed SEDKI
    Bernoulli | 2019
    No summary available.
  • Application of ABC to infer the genetic history of Pygmy hunter-gatherer populations from Western Central Africa.

    Arnaud ESTOUP, Alexandre DEHNE GARCIA, Paul VERDU, Jean michel MARIN, Christian ROBERT, Jean marie CORNUET, Pierre PUDLO
    Handbook of Approximate Bayesian Computation | 2018
    No summary available.
  • Computational Solutions for Bayesian Inference in Mixture Models.

    Christian ROBERT, Gilles CELEUX, Kaniav KAMARY, Gertraud MALSINER WALLI, Jean michel MARIN
    Handbook of Mixture Analysis | 2018
    This chapter surveys the most standard Monte Carlo methods available for simulating from a posterior distribution associated with a mixture and conducts some experiments about the robustness of the Gibbs sampler in high dimensional Gaussian settings. This is a chapter prepared for the forthcoming 'Handbook of Mixture Analysis'.
  • Gene expression modeling from DNA sequence data.

    May TAHA, Chloe BESSIERE, Florent PETITPREZ, Jimmy VANDEL, Jean michel MARIN, Laurent BREHELIN, Sophie LEBRE, Charles henri LECELLIER
    JdS 2017, 49èmes Journées de Statistique de la SFdS | 2017
    Gene expression is tightly controlled to ensure a wide variety of functions and cell types. The development of diseases, especially cancers, is invariably linked to the deregulation of these controls. Our goal is to model the link between gene expression and the nucleotide composition of different regulatory regions of the genome. We propose to address this problem in a regression framework with a Lasso approach coupled to a regression tree. We use exclusively sequence data and learn a different model for each cell type. We show (i) that the different regulatory regions provide different and complementary information and (ii) that the sole information of their nucleotide composition allows us to predict gene expression with an error comparable to that obtained using experimental data. Furthermore, the learned linear model does not perform as well for all genes, but better models certain classes of genes with particular nucleotide compositions.
  • Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig.

    Christian p. ROBERT, Gilles CELEUX, Jack JEWSON, Julie JOSSE, Jean michel MARIN
    2017
    This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A.
  • Detecting past contraction in population size using haplotype homozygosity.

    C MERLE, Jean michel MARIN, F. ROUSSET, Raphael LEBLOIS
    Mathematical and Computational Evolutionnary Biology 2016 | 2016
    No summary available.
  • Detecting past contraction in population size using runs of homozygosity.

    Coralie MERLE, Raphael LEBLOIS, Jean michel MARIN, Francois ROUSSET
    48èmes Journées de Statistique de la SFdS | 2016
    No summary available.
  • A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies.

    Antoine BARBIERI, Jean michel MARIN, Karine FLORIN
    2016
    For in vivo research experiments with small sample sizes and available historical data, we propose a sequential Bayesian method for the Behrens-Fisher problem. We consider it as a model choice question with two models in competition: one for which the two expectations are equal and one for which they are different. The choice between the two models is performed through a Bayesian analysis, based on a robust choice of combined objective and subjective priors, set on the parameters space and on the models space. Three steps are necessary to evaluate the posterior probability of each model using two historical datasets similar to the one of interest. Starting from the Jeffreys prior, a posterior using a first historical dataset is deduced and allows to calibrate the Normal-Gamma informative priors for the second historical dataset analysis, in addition to a uniform prior on the model space. From this second step, a new posterior on the parameter space and the models space can be used as the objective informative prior for the last Bayesian analysis. Bayesian and frequentist methods have been compared on simulated and real data. In accordance with FDA recommendations, control of type I and type II error rates has been evaluated. The proposed method controls them even if the historical experiments are not completely similar to the one of interest.
  • Bayesian essentials with R.

    Jean michel MARIN, Christian ROBERT
    2014
    This Bayesian modeling book provides a self-contained entry to computational Bayesian statistics. Focusing on the most standard statistical models and backed up by real datasets and an all-inclusive R (CRAN) package called bayess, the book provides an operational methodology for conducting Bayesian inference, rather than focusing on its theoretical and philosophical justifications. Readers are empowered to participate in the real-life data analysis situations depicted here from the beginning. The stakes are high and the reader determines the outcome. Special attention is paid to the derivation of prior distributions in each case and specific reference solutions are given for each of the models. Similarly, computational details are worked out to lead the reader towards an effective programming of the methods given in the book. In particular, all R codes are discussed with enough detail to make them readily understandable and expandable. This works in conjunction with the bayess package. Bayesian Essentials with R can be used as a textbook at both undergraduate and graduate levels, as exemplified by courses given at Université Paris Dauphine (France), University of Canterbury (New Zealand), and University of British Columbia (Canada). It is particularly useful with students in professional degree programs and scientists to analyze data the Bayesian way. The text will also enhance introductory courses on Bayesian statistics. Prerequisites for the book are an undergraduate background in probability and statistics, if not in Bayesian statistics. A strength of the text is the noteworthy emphasis on the role of models in statistical analysis. This is the new, fully-revised edition to the book Bayesian Core: A Practical Approach to Computational Bayesian Statistics.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr