VAYATIS Nicolas

< Back to ILB Patrimony
Topics of productions
Affiliations
  • 2019 - 2021
    Centre Borelli
  • 2019 - 2021
    Ecole normale supérieure de Paris-Saclay
  • 2005 - 2020
    Laboratoire de probabilités et modèles aléatoires
  • 2012 - 2019
    Centre de mathématiques et de leurs applications
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2009
  • 2006
  • Unsupervised Multi-source Domain Adaptation for Regression.

    Guillaume RICHARD, Antoine de MATHELIN, Georges HEBRAIL, Mathilde MOUGEOT, Nicolas VAYATIS
    Lecture Notes in Computer Science | 2021
    No summary available.
  • Sleep apnea syndrome and subthalamic stimulation in Parkinson's disease.

    Panagiotis BARGIOTAS, Ioannis BARGIOTAS, Ines DEBOVE, M lenard LACHENMAYER, Nicolas VAYATIS, W m michael SCHUEPBACH, Claudio l a BASSETTI
    Sleep Medicine | 2021
    No summary available.
  • Localized Interpolation for Graph Signals.

    A. MAZARGUIL, L. OUDRE, N. VAYATIS
    2020 28th European Signal Processing Conference (EUSIPCO) | 2021
    No summary available.
  • Discrepancy-Based Active Learning for Domain Adaptation.

    Antoine DE MATHELIN, Mathilde MOUGEOT, Nicolas VAYATIS
    2021
    The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of domain shift in the case of Lipschitz labeling function. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. Practical algorithms are inferred from the theoretical bounds, one is based on greedy optimization and the other is a K-medoids algorithm. We also provide improved versions of the algorithms to address the case of large data sets. These algorithms are competitive against other state-of-the-art active learning techniques in the context of domain adaptation as shown in our numerical experiments, in particular on large data sets of around one hundred thousand images.
  • Event detection and structure inference for graph vectors.

    Batiste LE BARS, Nicolas VAYATIS, Charles BOUVEYRON, George MICHAILIDIS, Fabrice ROSSI, Gilles BLANCHARD, Argyris KALOGERATOS, Tabea REBAFKA, George MICHAILIDIS, Fabrice ROSSI
    2021
    This thesis addresses different problems around the analysis and modeling of signals on graphs, in other words vector data observed on graphs. We are particularly interested in two specific tasks. The first one is the problem of event detection, i.e. the detection of anomalies or breaks, in a set of vectors on graphs. The second task consists in the inference of the graph structure underlying the vectors contained in a data set. At first, our work is application oriented. We propose a method to detect antenna failures in a telecommunication network. The proposed methodology is designed to be efficient for communication networks in a broad sense and implicitly takes into account the underlying structure of the data. In a second step, a new graph inference method in the framework of Graph Signal Processing is studied. In this problem, notions of local and global regularity, with respect to the underlying graph, are imposed on vectors. Finally, we propose to combine the graph learning task with the break detection problem. This time, a probabilistic framework is considered to model the vectors, assumed to be distributed according to a certain Markov random field. In our modeling, the graph underlying the data can change over time and a breakpoint is detected whenever it changes significantly.
  • Revealing posturographic profile of patients with Parkinsonian syndromes through a novel hypothesis testing framework based on machine learning.

    Ioannis BARGIOTAS, Argyris KALOGERATOS, Myrto LIMNIOS, Pierre paul VIDAL, Damien RICARD, Nicolas VAYATIS
    PLOS ONE | 2021
    No summary available.
  • Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking.

    Stephan CLEMENCON, Myrto LIMNIOS, Nicolas VAYATIS
    2021
    The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be viewed as summaries of the ROC curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.
  • Sequential Resource Allocation for network diffusion control.

    Mathilde FEKOM, Nicolas VAYATIS, Argyris KALOGERATOS, Pierre yves BOELLE, Jean pierre NADAL, Nicole IMMORLICA, Elisabeta VERGU, Theodoros EVGENIOU, Jean pierre NADAL, Nicole IMMORLICA
    2021
    Dynamic containment of an undesirable network diffusion process, such as an epidemic, requires a decision maker (DM) to be able to respond to its evolution by taking the right control measures at the right time. This task can be viewed as managing the allocation of a limited amount of resources to network nodes, with the objective of reducing the effects of the process.In this thesis, we extend the dynamic resource allocation (DRA) problem and pro- posit a dynamic control framework with multiple iterations/turns, which we realize through two derived models: restricted DRA and sequential DRA (RDRA, SDRA). Unlike standard considerations in which information and access are complete, these new models take into account possible access restrictions regarding the information available on the network and/or the ability to act on its nodes. At each intervention cycle, the DM has limited access to information about a fraction of the nodes, and also gains access to act on them sequentially.This latter sequential aspect in the decision process offers a completely new perspective to the control of the dynamic diffusion process, making this work the first to present the dynamic control problem as a series of sequential selection processesIn the sequential selection problem (SSP), immediate and irrevocable decisions must be made by the decision maker, while candidates arrive in a random order and are considered for one of the available selection slots. For the purposes of network broadcast control, what we pro- pose is to select the right nodes to allocate control resources to in a sequential, multi-iteration process. However, standard SSP vari- ants, such as the well-known secretary problem, start with an empty selection set (cold start) and perform the selection process once on a single set of candidates (single iteration). Both of these limitations are addressed in this thesis. First, we introduce a new hot-start setting that considers having a reference set at hand, i.e., a set of previously selected elements of a given quality. The DM then attempts to optimally update this set while examining the sequence of arriving candidates, constrained by the possibility of updating the assignment to each selection slot (resource) at most once. The sequential selection pro- cess with multiple iterations, is then introduced as a natural extension of hot-start selection.Objective functions based on the rank and score of the final selection are considered. An approach based on the separation of the sequence into two phases is proposed for the first one, while the optimal strategy based on the computation of a dy- namic acceptance threshold is derived for the second one assuming that the distribution of scores is known. These strategies are then compared for their efficiency in the context of traditional selection as well as for the resolution of the network control problems that motivated this thesis. The generality of the models introduced allows their application to a wide variety of domains and problems. For example, recurrent recruitment processes, resource management (e.g., beds, staff) in health care units, as well as the solution of difficult constrained combinatorial problems, such as the b-diversification problem found in data flow processing applications (e.g., in robotics).
  • Epidemic Models for Personalised COVID-19 Isolation and Exit Policies Using Clinical Risk Predictions.

    Theodoros EVGENIOU, Mathilde FEKOM, Anton OVCHINNIKOV, Raphael PORCHER, Camille POUCHOL, Nicolas VAYATIS
    SSRN Electronic Journal | 2020
    In mid April 2020, with more than 2.5 billion people in the world following social distancing measures due to COVID-19, governments are considering relaxing lock-down. We combined individual clinical risk predictions with epidemic modelling to examine simulations of isolation and exit policies. Methods: We developed a method to include personalised risk predictions in epidemic models based on data science principles. We extended a standard susceptible-exposed-infected-removed (SEIR) model to account for predictions of severity, defined by the risk of an individual needing intensive care in case of infection. We studied example isolation policies using simulations with the risk-extended epidemic model, using COVID-19 data and estimates in France as of mid April 2020 (4 000 patients in ICU, around 7 250 total ICU beds occupied at the peak of the outbreak, 0.5 percent of patients requiring ICU upon infection). We considered scenarios varying in the discrimination performance of a risk prediction model, in the degree of social distancing, and in the severity rate upon infection. Confidence intervals were obtained using an Approximate Bayesian Computation approach. The framework may be used with other epidemic models, with other risk predictions, and for other epidemic outbreaks.
  • Quantitative assessment of consciousness during anesthesia without EEG data.

    Clement DUBOST, Pierre HUMBERT, Laurent OUDRE, Christophe LABOURDETTE, Nicolas VAYATIS, Pierre paul VIDAL
    Journal of Clinical Monitoring and Computing | 2020
    No summary available.
  • An opinion paper on the maintenance of robustness: Towards a multimodal and intergenerational approach using digital twins.

    Pierre paul VIDAL, Alienor VIENNE JUMEAU, Albane MOREAU, Catherine VIDAL, Danping WANG, Julien AUDIFFREN, Ioannis BARGIOTAS, Remi BARROIS, Stephane BUFFAT, Clement DUBOST, Jean michel GHIDAGLIA, Christophe LABOURDETTE, Juan MANTILLA, Laurent OUDRE, Flavien QUIJOUX, Matthieu ROBERT, Alain p YELNIK, Damien RICARD, Nicolas VAYATIS
    AGING MEDICINE | 2020
    No summary available.
  • Low Rank Activations for Tensor-Based Convolutional Sparse Coding.

    Pierre HUMBERT, Julien AUDIFFREN, Laurent OUDRE, Nicolas VAYATIS
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 2020
    No summary available.
  • The Use of Inertial Measurement Units for the Study of Free Living Environment Activity Assessment: A Literature Review.

    Sylvain JUNG, Mona MICHAUD, Laurent OUDRE, Eric DORVEAUX, Louis GORINTIN, Nicolas VAYATIS, Damien RICARD
    Sensors | 2020
    No summary available.
  • Multivariate analysis with tensors and graphs – application to neuroscience.

    Pierre HUMBERT, Nicolas VAYATIS, Laurent OUDRE, Julien AUDIFFREN, Remi GRIBONVAL, Cedric RICHARD, Dimitri VAN DE VILLE, Alexandre GRAMFORT, Stephanie ALLASSONNIERE, Cedric RICHARD, Dimitri VAN DE VILLE
    2020
    How to extract information from multivariate data has become a fundamental question in recent years. Indeed, their increasing availability has highlighted the limitations of standard models and the need to evolve towards more versatile methods. The main objective of this thesis is to provide methods and algorithms taking into account the structure of multivariate signals. Well-known examples of such signals are images, stereo audio signals, and multi-channel electroencephalography signals. Among existing approaches, we specifically focus on those based on graph or tensor induced structure which have already attracted increasing attention due to their ability to better exploit the multivariate aspect of the data and their underlying structure. Although this thesis takes the study of general anesthesia as its preferred application context, the methods developed are suitable for a wide spectrum of multivariate structured data.
  • Selective review of offline change point detection methods.

    Charles TRUONG, Laurent OUDRE, Nicolas VAYATIS
    Signal Processing | 2020
    No summary available.
  • Multivariate two-sample hypothesis testing through AUC maximization for biomedical applications.

    Ioannis BARGIOTAS, Argyris KALOGERATOS, Myrto LIMNIOS, Pierre paul VIDAL, Damien RICARD, Nicolas VAYATIS
    11th Hellenic Conference on Artificial Intelligence | 2020
    No summary available.
  • Epidemic Models for Personalised COVID-19 Isolation and Exit Policies Using Clinical Risk Predictions.

    Theodoros EVGENIOU, Mathilde FEKOM, Anton OVCHINNIKOV, Raphael PORCHER, Camille POUCHOL, Nicolas VAYATIS
    2020
    In mid April 2020, with more than 2.5 billion people in the world following social distancing measures due to COVID-19, governments are considering relaxing lock-down. We combined individual clinical risk predictions with epidemic modelling to examine simulations of isolation and exit policies. Methods: We developed a method to include personalised risk predictions in epidemic models based on data science principles. We extended a standard susceptible-exposed-infected-removed (SEIR) model to account for predictions of severity, defined by the risk of an individual needing intensive care in case of infection. We studied example isolation policies using simulations with the risk-extended epidemic model, using COVID-19 data and estimates in France as of mid April 2020 (4 000 patients in ICU, around 7 250 total ICU beds occupied at the peak of the outbreak, 0.5 percent of patients requiring ICU upon infection). We considered scenarios varying in the discrimination performance of a risk prediction model, in the degree of social distancing, and in the severity rate upon infection. Confidence intervals were obtained using an Approximate Bayesian Computation approach. The framework may be used with other epidemic models, with other risk predictions, and for other epidemic outbreaks.
  • Unsupervised Multi-Source Domain Adaptation for Regression.

    Guillaume RICHARD, Antoine DE MATHELIN, Georges HEBRAIL, Mathilde MOUGEOT, Nicolas VAYATIS
    2020
    We consider the problem of unsupervised domain adaptation from multiple sources in a regression setting. We propose in this work an original method to take benefit of different sources using a weighted combination of the sources. For this purpose, we define a new measure of similarity between probabilities for domain adaptation which we call hypothesis-discrepancy. We then prove a new bound for unsupervised domain adaptation combining multiple sources. We derive from this bound a novel adversarial domain adaptation algorithm adjusting weights given to each source, ensuring that sources related to the target receive higher weights. We finally evaluate our method on different public datasets and compare it to other domain adaptation baselines to demonstrate the improvement for regression tasks.
  • Adversarial Weighting for Domain Adaptation in Regression.

    Antoine DE MATHELIN, Guillaume RICHARD, Mathilde MOUGEOT, Nicolas VAYATIS
    2020
    We present a novel instance based approach to handle regression tasks in the context of supervised domain adaptation. The approach developed in this paper relies on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimization objective for domain adaptation which relies on a discrepancy distance characterizing the difference between domains according to a specific task and a class of hypotheses. To solve this problem, we develop an adversarial network algorithm which learns both the source weighting scheme and the task in one feed-forward gradient descent. We provide numerical evidence of the relevance of the method on public datasets for domain adaptation through reproducible experiments accessible via an online demo interface.
  • Robust Kernel Density Estimation with Median-of-Means principle.

    Pierre HUMBERT, Batiste LE BARS, Ludovic MINVIELLE, Nicolas VAYATIS
    2020
    In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness to any kind of anomalous data, even in the case of adversarial contamination. In particular, while previous works only prove consistency results under known contamination model, this work provides finite-sample high-probability error-bounds without a priori knowledge on the outliers. Finally, when compared with other robust kernel estimators, we show that MoM-KDE achieves competitive results while having significant lower computational complexity.
  • Classification of events from ground sensors - Application to the monitoring of fragile people.

    Ludovic MINVIELLE, Nicolas VAYATIS, Mathilde MOUGEOT, Bernadette DORIZZI, Amaury HABRARD, Francois CHARPILLET, Miguel COLOM, Amaury HABRARD, Francois CHARPILLET
    2020
    This thesis deals with the detection of events in signals from ground sensors for the monitoring of elderly people. In view of the practical issues, it seems indeed that pressure sensors located on the ground are good candidates for monitoring activities, especially fall detection. As the signals to be processed are complex, sophisticated models should be used. Thus, in order to design a fall detector, we propose an approach based on random forests, while addressing hardware constraints with a variable selection procedure. The performance is improved using a data augmentation method as well as temporal aggregation of the model responses. We then address the issue of confronting our model to the real world, with transfer learning methods that act on the basic model of random forests, i.e. decision trees. These methods are adaptations of previous work and are designed to address the problem of class imbalance, where falling is a rare event. We test them on several datasets, showing encouraging results for the future, and a Python implementation is made available. Finally, motivated by the issue of tracking elderly people while processing a one-dimensional signal for a large area, we propose to distinguish elderly people from younger individuals using a convolutional neural network model and dictionary learning. Since the signals to be processed are mostly steps, the first brick of the model is trained to focus on the steps in the signals, and the second part of the model is trained separately on the final task. This new approach to gait classification allows to efficiently recognize signals from elderly people.
  • Hybrid Modelling for Lifetime Prediction.

    Fikri HAFID, Maxime GUEGUIN, Vincent LAURENT, Mathilde MOUGEOT, Nicolas VAYATIS, Christine YANG, Jean michel GHIDAGLIA
    Lecture Notes in Mechanical Engineering | 2020
    No summary available.
  • Epidemic Models for Personalised COVID-19 Isolation and Exit Policies Using Clinical Risk Predictions.

    Theodoros EVGENIOU, Mathilde FEKOM, Anton OVCHINNIKOV, Raphael PORCHER, Camille POUCHOL, Nicolas VAYATIS
    2020
    In mid April 2020, with more than 2.5 billion people in the world following social distancing measures due to COVID-19, governments are considering relaxing lock-down. We combined individual clinical risk predictions with epidemic modelling to examine simulations of isolation and exit policies. Methods: We developed a method to include personalised risk predictions in epidemic models based on data science principles. We extended a standard susceptible-exposed-infected-removed (SEIR) model to account for predictions of severity, defined by the risk of an individual needing intensive care in case of infection. We studied example isolation policies using simulations with the risk-extended epidemic model, using COVID-19 data and estimates in France as of mid April 2020 (4 000 patients in ICU, around 7 250 total ICU beds occupied at the peak of the outbreak, 0.5 percent of patients requiring ICU upon infection). We considered scenarios varying in the discrimination performance of a risk prediction model, in the degree of social distancing, and in the severity rate upon infection. Confidence intervals were obtained using an Approximate Bayesian Computation approach. The framework may be used with other epidemic models, with other risk predictions, and for other epidemic outbreaks.
  • Greedy Kernel Change-Point Detection.

    Laurent OUDRE, Nicolas VAYATIS, Charles TRUONG
    IEEE Transactions on Signal Processing | 2019
    No summary available.
  • Supervised Kernel Change Point Detection with Partial Annotations.

    Charles TRUONG, Laurent OUDRE, Nicolas VAYATIS
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 2019
    No summary available.
  • Optimal Multiple Stopping Rule for Warm-Starting Sequential Selection.

    Mathilde FEKOM, Nicolas VAYATIS, Argyris KALOGERATOS
    2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) | 2019
    No summary available.
  • A Data Set for the Study of Human Locomotion with Inertial Measurements Units.

    Thomas MOREAU, Clement PROVOST, Pierre paul VIDAL, Nicolas VAYATIS, Stephane BUFFAT, Alain YELNIK, Damien RICARD, Laurent OUDRE
    Image Processing On Line | 2019
    No summary available.
  • Effects of bilateral stimulation of the subthalamic nucleus in Parkinson's disease with and without REM sleep behaviour disorder.

    Panagiotis BARGIOTAS, Ines DEBOVE, Ioannis BARGIOTAS, Martin lenard LACHENMAYER, Maria NTAFOULI, Nicolas VAYATIS, Michael wm SCHUPBACH, Paul KRACK, Claudio l BASSETTI
    Journal of Neurology, Neurosurgery & Psychiatry | 2019
    No summary available.
  • Balance Impairment in Radiation Induced Leukoencephalopathy Patients Is Coupled With Altered Visual Attention in Natural Tasks.

    Ioannis BARGIOTAS, Albane MOREAU, Alienor VIENNE, Flavie BOMPAIRE, Marie BARUTEAU, Marie DE LAAGE, Mateo CAMPOS, Dimitri PSIMARAS, Nicolas VAYATIS, Christophe LABOURDETTE, Pierre paul VIDAL, Damien RICARD, Stephane BUFFAT
    Frontiers in Neurology | 2019
    No summary available.
  • Selection of the Best Electroencephalogram Channel to Predict the Depth of Anesthesia.

    Clement DUBOST, Pierre HUMBERT, Arno BENIZRI, Jean pierre TOURTIER, Nicolas VAYATIS, Pierre paul VIDAL
    Frontiers in Computational Neuroscience | 2019
    No summary available.
  • The Complementary Role of Activity Context in the Mental Workload Evaluation of Helicopter Pilots: A Multi-tasking Learning Approach.

    Ioannis BARGIOTAS, Alice NICOLAI, Pierre paul VIDAL, Christophe LABOURDETTE, Nicolas VAYATIS, Stephane BUFFAT
    Human Mental Workload: Models and Applications | 2019
    No summary available.
  • Multivariate Convolutional Sparse Coding with Low Rank Tensor.

    Pierre HUMBERT, Julien AUDIFFREN, Laurent OUDRE, Nicolas VAYATIS
    2019
    This paper introduces a new multivariate convolutional sparse coding based on tensor algebra with a general model enforcing both element-wise sparsity and low-rankness of the activations tensors. By using the CP decomposition, this model achieves a significantly more efficient encoding of the multivariate signal-particularly in the high order/ dimension setting-resulting in better performance. We prove that our model is closely related to the Kruskal tensor regression problem, offering interesting theoretical guarantees to our setting. Furthermore, we provide an efficient optimization algorithm based on alternating optimization to solve this model. Finally, we evaluate our algorithm with a large range of experiments, highlighting its advantages and limitations.
  • Sequential Dynamic Resource Allocation for Epidemic Control.

    Mathilde FEKOM, Nicolas VAYATIS, Argyris KALOGERATOS
    2019 IEEE 58th Conference on Decision and Control (CDC) | 2019
    No summary available.
  • Spectral bounds in random graphs applied to spreading phenomena and percolation.

    Remi LEMONNIER, Kevin SCAMAN, Nicolas VAYATIS
    Advances in Applied Probability | 2018
    No summary available.
  • Template-Based Step Detection with Inertial Measurement Units.

    Laurent OUDRE, Remi BARROIS MULLER, Thomas MOREAU, Charles TRUONG, Alienor VIENNE JUMEAU, Damien RICARD, Nicolas VAYATIS, Pierre paul VIDAL
    Sensors | 2018
    No summary available.
  • On the importance of local dynamics in statokinesigram: A multivariate approach for postural control evaluation in elderly.

    Ioannis BARGIOTAS, Julien AUDIFFREN, Nicolas VAYATIS, Pierre paul VIDAL, Stephane BUFFAT, Alain p YELNIK, Damien RICARD
    PLOS ONE | 2018
    No summary available.
  • Representations for anomaly detection: Application to aircraft engine vibration data.

    Mina ABDEL SAYED, Gilles FAY, Mathilde MOUGEOT, Nicolas VAYATIS, Mohamed EL BADAOUI, Jerome LACAILLE, Younes BENNANI, Nadine MARTIN
    2018
    Vibration measurements are one of the most relevant data to detect engine anomalies. Vibrations are acquired on a test bench during acceleration and deceleration to ensure engine reliability at the end of the production line. These temporal data are converted into spectrograms to allow the experts to perform a visual analysis of these data and to detect the various atypical signatures. The vibratory sources correspond to lines on the spectrograms. In this thesis, we have implemented an automatic decision support tool to analyze the spectrograms and detect any type of atypical signatures, these signatures do not necessarily come from an engine damage. First, we built a digital database of annotated spectrograms. It is important to note that unusual signatures are variable in shape, intensity and position and are found in a small amount of data. Therefore, to detect these signatures, we characterize the normal behaviors of the spectrograms, analogous to novelty detection methods, by representing the patches of the spectrograms on dictionaries such as curvelets and Non-negative matrix factorization (NMF), as well as by estimating the distribution of each point of the spectrogram from normal data depending or not on their neighborhood. The detection of atypical points is performed by comparing the test data to the normality model estimated on normal training data. The detection of atypical points allows the detection of unusual signatures composed by these points.
  • Information Diffusion and Rumor Spreading.

    Argyris KALOGERATOS, Kevin SCAMAN, Luca CORINZIA, Nicolas VAYATIS
    Cooperative and Graph Signal Processing | 2018
    No summary available.
  • Some contributions to global optimization.

    Cedric MALHERBE, Nicolas VAYATIS, Alexandre b. TSYBAKOV, Nicolas VAYATIS, Alexandre b. TSYBAKOV, Gilles BLANCHARD, Jean philippe VERT, Remi MUNOS, Olivier TEYTAUD, Gilles BLANCHARD, Jean philippe VERT
    2017
    This thesis is concerned with the sequential optimization problem of an unknown function defined on a continuous and bounded set. This type of problem appears in particular in the design of complex systems, when one seeks to optimize the result of numerical simulations or more simply when the function that one wishes to optimize does not present any form of obvious regularity like linearity or convexity. In a first step, we focus on the particular case of lipschitzian functions. We introduce two new strategies aiming at optimizing any function of known and unknown Lipschitz coefficient. Then, by introducing different regularity measures, we formulate and obtain consistency results for these methods as well as convergence speeds on their approximation errors. In a second part, we propose to explore the domain of binary scheduling in order to develop optimization strategies for non-regular functions. By observing that learning the scheduling rule induced by the unknown function allows the systematic identification of its optimum, we make the link between scheduling theory and optimization theory, which allows us to develop new methods based on the choice of any scheduling technique and to formulate different convergence results for the optimization of non-regular functions. Finally, the optimization strategies developed during the thesis are compared to existing state-of-the-art methods on calibration problems of learning systems as well as on synthetic problems frequently encountered in the field of global optimization.
  • Parsimonious Convolutional Representations -- application to physiological signals and deep learning interpetability.

    Thomas MOREAU, Nicolas VAYATIS, Laurent OUDRE, Stephanie ALLASSONNIERE, Nicolas VAYATIS, Laurent OUDRE, Stephanie ALLASSONNIERE, Julien MAIRAL, Stephane MALLAT, Rene VIDAL, Alexandre GRAMFORT, Pierre paul VIDAL, Julien MAIRAL, Stephane MALLAT, Rene VIDAL
    2017
    Convolutional representations extract recurrent patterns that help to understand the local structure in a set of signals. They are suitable for physiological signal analysis, which requires visualizations that highlight relevant information. These representations are also related to deep learning models. In this manuscript, we describe algorithmic and theoretical advances around these models. We first show that Singular Spectrum Analysis can efficiently compute a convolutional representation. This representation is dense and we describe an automated procedure to make it more interpretable. We then propose an asynchronous algorithm to accelerate convolutional parsimonious coding. Our algorithm presents a super-linear acceleration. In a second part, we analyze the links between representations and neural networks. We propose an additional learning step, called post-training, which improves the performance of the trained network by ensuring that the last layer is optimal. Then we study the mechanisms that make it possible to accelerate parsimonious coding with neural networks. We show that this is related to a factorization of the Gram matrix of the dictionary. Finally, we illustrate the interest of using convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize walking signals and gaze motion is subtracted from oculometric signals with Singular Spectrum Analysis.
  • Fall detection using smart floor sensor and supervised learning.

    Ludovic MINVIELLE, Mounir ATIQ, Renan SERRA, Mathilde MOUGEOT, Nicolas VAYATIS
    2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) | 2017
    No summary available.
  • Statistical learning for event sequences using point processes.

    Massil ACHAB, Emmanuel BACRY, St?phane GA?FFAS, Nicolas VAYATIS, Emmanuel BACRY, St?phane GA?FFAS, Vincent RIVOIRARD, Manuel GOMEZ RODRIGUEZ, Nils richard HANSEN
    2017
    The goal of this thesis is to show that the arsenal of new optimization methods allows us to solve difficult estimation problems based on event models.These dated events are ordered chronologically and therefore cannot be considered as independent.This simple fact justifies the use of a particular mathematical tool called point process to learn a certain structure from these events. The first is the point process behind the Cox proportional hazards model: its conditional strength allows to define the hazard ratio, a fundamental quantity in the survival analysis literature.The Cox regression model relates the time to the occurrence of an event, called a failure, to the covariates of an individual.This model can be reformulated using the point process framework. The second is the Hawkes process which models the impact of past events on the probability of future events.The multivariate case allows to encode a notion of causality between the different dimensions considered.This theme is divided into three parts.The first part is concerned with a new optimization algorithm that we have developed.It allows to estimate the parameter vector of the Cox regression when the number of observations is very large.Our algorithm is based on the SVRG (Stochastic Variance Reduced Gradient) algorithm and uses an MCMC (Monte Carlo Marker Model) method.We have proved convergence speeds for our algorithm and have shown its numerical performance on simulated and real-world data sets.The second part shows that causality in the Hawkes sense can be reduced to a minimum. The second part shows that the causality in the Hawkes sense can be estimated in a non-parametric way thanks to the integrated cumulants of the multivariate point process.We have developed two methods for estimating the integrals of the kernels of the Hawkes process, without making any assumption on the shape of these kernels. Our methods are faster and more robust, with respect to the shape of the kernels, compared to the state of the art. We have demonstrated the statistical consistency of the first method, and have shown that the second one can be applied to a convex optimization problem.The last part highlights the order book dynamics using the first non-parametric estimation method introduced in the previous part.We have used data from the EUREX futures market, defined new order book models (e.g., the order book of the same day), and developed a new method for the estimation of the order book.We have used data from the EUREX futures market, developed new order book models (based on the previous work of Bacry et al.) and applied the estimation method on these point processes.The results obtained are very satisfactory and consistent with an economic analysis.Such a work proves that the method we have developed allows to extract a structure from data as complex as those from high-frequency finance.
  • Numerical methods for the research and design of optimal gearbox architectures.

    Steven MASFARAUD, Nicolas VAYATIS, Florian de VUYST, Laurent FRIBOURG, Nicolas VAYATIS, Florian de VUYST, Laurent FRIBOURG, Eric FLORENTIN, Pierre VILLON, Fabrice DANES, Jean francois RAMEAU, Khy TAN, Eric FLORENTIN, Pierre VILLON
    2016
    The design of a gearbox requires the initial choice of an architecture, the principle solution of the object to be designed. This choice is very structuring and has a very strong impact on the performance criteria of the gearbox without the engineer having a clear visibility on this impact. Once the architecture has been chosen, it is possible to use continuous optimization techniques to optimize the performance criteria and the respect of constraints with respect to a specification. This kind of optimization aims to optimally determine the structural dimensions of the gearbox such as the positions of the shaft axes in space or the diameters of the pinions. The objective of this thesis is to provide scientific techniques to choose the optimal architecture with respect to this specification. The development of such a method aims to obtain more efficient gearboxes, but also to reduce the engineering development time by ensuring by scientific methods the respect of the constraints expressed in the specifications, and this from the choice of the architecture, which is done by trial and error in the usual design cycle.
  • Prediction and optimization of wave energy converter arrays using a machine learning approach.

    Dripta SARKAR, Emile CONTAL, Nicolas VAYATIS, Frederic DIAS
    Renewable Energy | 2016
    No summary available.
  • Suppressing Epidemics in Networks Using Priority Planning.

    Kevin SCAMAN, Argyris KALOGERATOS, Nicolas VAYATIS
    IEEE Transactions on Network Science and Engineering | 2016
    No summary available.
  • A Non Linear Scoring Approach for Evaluating Balance: Classification of Elderly as Fallers and Non-Fallers.

    Julien AUDIFFREN, Ioannis BARGIOTAS, Nicolas VAYATIS, Pierre paul VIDAL, Damien RICARD
    PLOS ONE | 2016
    Almost one third of population 65 years-old and older faces at least one fall per year. An accurate evaluation of the risk of fall through simple and easy-to-use measurements is an important issue in current clinic. A common way to evaluate balance in posturography is through the recording of the centre-of-pressure (CoP) displacement (statokinesigram) with force platforms. A variety of indices have been proposed to differentiate fallers from non fallers. However, no agreement has been reached whether these analyses alone can explain sufficiently the complex synergies of postural control. In this work, we study the statokinesigrams of 84 elderly subjects (80.3+- 6.4 years old), which had no impairment related to balance control. Each subject was recorded 25 seconds with eyes open and 25 seconds with eyes closed and information pertaining to the presence of problems of balance, such as fall, in the last six months, was collected. Five descriptors of the statokinesigrams were computed for each record, and a Ranking Forest algorithm was used to combine those features in order to evaluate each subject's balance with a score. A classical train-test split approach was used to evaluate the performance of the method through ROC analysis. ROC analysis showed that the performance of each descriptor separately was close to a random classifier (AUC between 0.49 and 0.54). On the other hand, the score obtained by our method reached an AUC of 0.75 on the test set, consistent over multiple train-test split. This non linear multi-dimensional approach seems appropriate in evaluating complex postural control.
  • Anomaly Ranking in a High Dimensional Space: The Unsupervised TreeRank Algorithm.

    S. CLEMENCON, N. BASKIOTIS, N. VAYATIS
    Unsupervised Learning Algorithms | 2016
    Ranking unsupervised data in a multivariate feature space \(\mathcal{X} \subset \mathbb{R}^{d}\), d ≥ 1 by degree of abnormality is of crucial importance in many applications (e.g., fraud surveillance, monitoring of complex systems/infrastructures such as energy networks or aircraft engines, system management in data centers). However, the learning aspect of unsupervised ranking has only received attention in the machine-learning community in the past few years. The Mass-Volume (MV) curve has been recently introduced in order to evaluate the performance of any scoring function \(s: \mathcal{X} \rightarrow \mathbb{R}\) with regard to its ability to rank unlabeled data. It is expected that relevant scoring functions will induce a preorder similar to that induced by the density function f(x) of the (supposedly continuous) probability distribution of the statistical population under study. As far as we know, there is no efficient algorithm to build a scoring function from (unlabeled) training data with nearly optimal MV curve when the dimension d of the feature space is high. It is the major purpose of this chapter to introduce such an algorithm which we call the Unsupervised TreeRank algorithm. Beyond its description and the statistical analysis of its performance, numerical experiments are exhibited in order to provide empirical evidence of its accuracy.
  • Statistical learning methods for global optimization.

    Emile CONTAL, Nicolas VAYATIS, Pascal MASSART, Nicolas VAYATIS, Pascal MASSART, Josselin GARNIER, Andreas KRAUSE, Vianney PERCHET, Aurelien GARIVIER, Josselin GARNIER, Andreas KRAUSE
    2016
    This thesis is devoted to a rigorous analysis of equential global optimization algorithms. We place ourselves in a stochastic bandit model where an agent aims at determining the input of a system optimizing a criterion. This target function is not known and the agent sequentially performs queries to evaluate its value at the inputs it chooses. This function may not be convex and contain a large number of local optima. We address the difficult case where the evaluations are costly, which requires designing a rigorous selection of queries. We consider two objectives, on the one hand the optimization of the sum of the values received at each iteration, on the other hand the optimization of the best value found so far. This thesis follows the Bayesian optimization framework when the function is a realization of a known stochastic process, and also introduces a new approach to scheduling optimization where only comparisons of the function values are performed. We propose new algorithms and provide theoretical concepts to obtain performance guarantees. We give an optimization strategy that adapts to observations received by batch and not individually. A generic study of local supremums of stochastic processes allows us to analyze Bayesian optimization on nonparametric search spaces. We also show that our approach extends to natural non-Gaussian processes. We establish links between active learning and statistical learning of schedules and derive a potentially discontinuous function optimization algorithm.
  • Application of stochastic processes to real-time auctions and information propagation in social networks.

    Remi LEMONNIER, Nicolas VAYATIS, Nicolas VAYATIS, Manuel GOMEZ RODRIGUEZ, Florent KRZAKALA, Marc HOFFMANN, Emmanuel BACRY, Manuel GOMEZ RODRIGUEZ, Florent KRZAKALA, Marc HOFFMANN
    2016
    In this thesis, we study two applications of stochastic processes to Internet marketing. The first chapter focuses on the scoring of Internet users for real-time auctions. This problem consists in finding the probability that a given Internet user performs an action of interest, called conversion, within a few days after the display of an advertising banner. We show that Hawkes processes are a natural model of this phenomenon but that state-of-the-art algorithms are not applicable to the size of data typically used in industrial applications. We therefore develop two new non-parametric inference algorithms that are several orders of magnitude faster than previous methods. We show empirically that the first one performs better than the state-of-the-art competitors, and that the second one can be applied to even larger datasets without paying too high a price in terms of predictive power. The resulting algorithms have been implemented with very good performances for several years at 1000 mercy, the leading marketing agency being the industrial partner of this CIFRE thesis, where they have become an important production asset. The second chapter focuses on diffusive processes on graphs which are an important tool to model the propagation of a viral marketing operation on social networks. We establish the first theoretical bounds on the total number of nodes reached by a contagion under any graph and diffusion dynamics, and show the existence of two distinct regimes: the sub-critical regime where at most $O(sqrt{n})$ nodes will be infected, where $n$ is the size of the network, and the over-critical regime where $O(n)$ nodes can be infected. We also study the behavior with respect to the observation time $T$ and highlight the existence of critical times below which a diffusion, even an over-critical one in the long run, behaves in a sub-critical way. Finally, we extend our work to percolation and epidemiology, where we improve existing results.
  • Cross-validation and penalization for density estimation.

    Nelo MAGALHAES, Lucien BIRGE, Pascal MASSART, Yannick BARAUD, Lucien BIRGE, Pascal MASSART, Yannick BARAUD, Vincent RIVOIRARD, Nicolas VAYATIS, Guillaume LECUE, Vincent RIVOIRARD, Nicolas VAYATIS
    2015
    This thesis is based on the estimation of a density, considered from a non-parametric and non-asymptotic point of view. It deals with the problem of the selection of a kernel estimation method. The latter is a generalization of, among others, the problem of model selection and window selection. We study classical procedures, by penalization and resampling (in particular V-fold cross-validation), which evaluate the quality of a method by estimating its risk. We propose, thanks to concentration inequalities, a method to optimally calibrate the penalty to select a linear estimator and prove oracle inequalities and adaptation properties for these procedures. Moreover, a new resampled procedure, based on the comparison between estimators by robust tests, is proposed as an alternative to procedures based on the principle of unbiased risk estimation. A second objective is the comparison of all these procedures from a theoretical point of view and the analysis of the role of the V-parameter for the V-fold penalties. We validate the theoretical results by simulation studies.
  • A New Framework for the Simulation of Offshore Oil Facilities at the System Level.

    Marc BONNISSEL, Joris COSTES, Jean michel GHIDAGLIA, Philippe MUGUERRA, Keld lund NIELSEN, Benjamin POIRSON, Xavier RIOU, Jean philippe SAUT, Nicolas VAYATIS
    Complex Systems Design & Management | 2015
    Offshore oil facilities are complex industrial systems: They are composed of numerous parts and involve both elaborate physics and stochastic aspects like failure risk or price variation. Several software tools are available to simulate individual components of offshore facilities, for instance to compute the flow dynamics in a particular device. There is however no tool to simulate the facility at the system level, i.e. to simulate the general behavior of the facility. The paper presents a framework for such a system-level simulator, which includes one layer for physics and one for risk simulation. The physical part uses the equation-based language Modelica[1]. Modelica components are defined to model typical devices of an installation. The risk simulation usesMarkov chains and statistical indicators to assess performance and resilience of the system. It runs with an external language (C or Scilab) and data from the Modelica simulation.
  • A Machine Learning Approach to the Analysis of Wave Energy Converters.

    Dripta SARKAR, Emile CONTAL, Nicolas VAYATIS, Frederic DIAS
    Volume 9: Ocean Renewable Energy | 2015
    The hydrodynamic analysis and estimation of the performance of wave energy converters (WECs) is generally performed using semi-analytical/numerical models. Commercial boundary element codes are widely used in analyzing the interactions in arrays comprising of wave energy conversion devices. However, the analysis of an array of such converters becomes computationally expensive, and the computational time increases as the number of devices in the system is increased. As such determination of optimal layouts of WECs in arrays becomes extremely difficult. In this study, an innovative active experimental approach is presented to predict the behaviour of theWECs in arrays. The input variables are the coordinates of the center of the wave energy converters. Simulations for training examples and validation are performed for an array of OscillatingWave Surge Converters, using the mathematical model of Sarkar et. al. (Proc. R. Soc. A, 2014). As a part of the initial findings, results will be presented on the performance of wave energy converters located well inside an array. The broader scope/aim of this research would be to predict the behaviour of the individual devices and overall performance of the array for arbitrary layouts of the system and then identify optimal layouts subject to various constraints.Copyright © 2015 by ASME.
  • A mathematical approach to stock market investing.

    Marouane ANANE, Frederic ABERGEL, Eric MOULINES, Frederic ABERGEL, Nicolas VAYATIS, Anirban CHAKRABORTI, Charles albert LEHALLE, Damien CHALLET, Nicolas VAYATIS, Anirban CHAKRABORTI
    2015
    The goal of this thesis is to answer the real need to predict future stock price fluctuations. Indeed, the randomness governing these fluctuations constitutes for financial actors, such as market makers, one of the greatest sources of risk. Throughout this study, we highlight the possibility of reducing the uncertainty on future prices by using appropriate mathematical models. This study is made possible thanks to a large financial database and a powerful computational grid made available to us by the Automatic Market Making team of BNP Paribas. In this paper, we only present the results of the research concerning high frequency trading. The results concerning the low-frequency part are of less scientific interest to the academic world and are also confidential. In the first chapter, we present the context and the objectives of this study. We also present the different methods used, as well as the main results obtained. In chapter 2, we focus on the contribution of technological superiority in high frequency trading. For this purpose, we simulate an ultra-fast, omniscient, and aggressive trader, and then we calculate his total gain over 3 years. The gains obtained are very modest and reflect the limited contribution of technology in high frequency trading. In chapter 3, we study the predictability of prices based on order book indicators. Using conditional expectations, we present empirical evidence of statistical dependencies between prices and the different indicators. The importance of these dependencies results from the simplicity of the method, eliminating any risk of overlearning the data. We then focus on the combination of the different indicators by a linear regression and we analyze the different numerical and statistical problems related to this method. Finally, we conclude that prices are predictable for a time horizon of a few minutes and we question the market efficiency hypothesis.In chapter 4, we focus on the price formation mechanism based on the arrival of events in the order book. We classify the orders into twelve types whose statistical properties we analyze. We then study the dependencies between these different types of orders and propose an order book model in line with empirical observations. Finally, we use this model to predict prices and we support the hypothesis of the non-efficiency of markets, suggested in chapter 3.
  • EpiBrainRad: an epidemiologic study of the neurotoxicity induced by radiotherapy in high grade glioma patients.

    Thomas DURAND, Sophie JACOB, Laura LEBOUIL, Hassen DOUZANE, Philippe LESTAEVEL, Amithys RAHIMIAN, Dimitri PSIMARAS, Loic FEUVRET, Delphine LECLERCQ, Bruno BROCHET, Radia TAMARAT, Fabien MILLIAT, Marc BENDERITTER, Nicolas VAYATIS, Georges NOEL, Khe HOANG XUAN, Jean yves DELATTRE, Damien RICARD, Marie odile BERNIER
    BMC Neurology | 2015
    Background Radiotherapy is one of the most important treatments of primary and metastatic brain tumors. Unfortunately, it can involve moderate to severe complications among which leukoencephalopathy is very frequent and implies cognitive deficits such as memory, attention and executive dysfunctions. However, the incidence of this complication is not well established and the risk factors and process are poorly understood. The main objective of the study is to improve knowledge on radio-induced leukoencephalopathy based on pluridisciplinar approaches combining cognitive, biologic, imagery and dosimetric investigations. Method/Design The EpiBrainRad study is a prospective cohort study including newly diagnosed high grade gliomas patients treated by radiotherapy and concomitant-adjuvant temozolomide chemotherapy. Patients are included between their surgery and first day of radio-chemotherapy, and the follow-up lasts for 3 years after treatment. Cognitive functioning assessments, specific blood biomarkers measures and magnetic resonance imagery are performed at different moment during the follow-up, and a specific dosimetric assessment of organs involved in the beam fields is performed. Firstly, leukoencephalopathy incidence rate will be estimated in this population. Secondly, correlations between cognitive impairments and dosimetry, biomarkers ranges and anomalies on imagery will be analyzed in order to better understand the onset and evolution of cognitive decrement associated with radiotherapy. Furthermore, a new cognitive test, quickly and easily performed, will be studied to determine its sensibility to detect leukoencephalopathy decrement. Discussion With an original multidisciplinary approach, the EpiBrainRad study aims to improve knowledge on radio-induced leukoencephalopathy in order to improve its early diagnosis and prevention. The main challenge is to preserve quality-of-life after cancer treatments which imply to study the incidence of radiation-induced complications and their associated risk factors. Trial Registration NCT02544178 © 2015 Durand et al.
  • Strong Consistency of the Bayesian Estimator for the Ornstein–Uhlenbeck Process.

    Arturo KOHATSU HIGA, Nicolas VAYATIS, Kazuhiro YASUDA
    Inspired by Finance | 2014
    In the accompanying paper Kohatsu-Higa et al. (submitted, 2013), we have done a theoretical study of the consistency of a computational intensive parameter estimation method for Markovian models. This method could be considered as an approximate Bayesian estimator method or a filtering problem approximated using particle methods. We showed in Kohatsu-Higa (submitted, 2013) that under certain conditions, which explicitly relate the number of data, the amount of simulations and the size of the kernel window, one obtains the rate of convergence of the method. In this first study, the conditions do not seem easy to verify and for this reason, we show in this paper how to verify these conditions in the toy example of the Ornstein–Uhlenbeck processes. We hope that this article will help the reader understand the theoretical background of our previous studies and how to interpret the required hypotheses.
  • Uniformly randomized forests and detection of social contribution irregularities.

    Saip CISS, Patrice BERTAIL, Pierre PICARD, Gerard BIAU, Patrice BERTAIL, Pierre PICARD, Gerard BIAU, Fabrice ROSSI, Nicolas VAYATIS, Jean PINQUET, Vincent RAVOUX, Fabrice ROSSI, Nicolas VAYATIS
    2014
    In this thesis, we present an application of statistical learning to the detection of social security irregularities. The purpose of statistical learning is to model problems in which there is a relationship, generally non-deterministic, between variables and the phenomenon that one seeks to evaluate. An essential aspect of this modeling is the prediction of unknown occurrences of the phenomenon, based on data already observed. In the case of social security contributions, the representation of the problem is expressed by the postulate of the existence of a relationship between the declarations of contributions made by companies and the controls carried out by the collection agencies. The control inspectors certify the correctness or inaccuracy of a certain number of declarations and notify, if necessary, an adjustment to the companies concerned. The learning algorithm "learns", thanks to a model, the relationship between the declarations and the results of the controls, and then produces an evaluation of all the declarations not yet controlled. The first part of the evaluation assigns a regular or irregular character to each declaration, with a certain probability. The second part estimates the expected adjustment amounts for each return. Within the URSSAF (Union de Recouvrement des cotisations de Sécurité sociale et d'Allocations Familiales) of Île-de-France, and in the framework of a CIFRE (Conventions Industrielles de Formation par la Recherche) contract, we have developed a model for detecting irregularities in social security contributions that we present and detail throughout the thesis. The algorithm runs under the open source software R. It is fully operational and has been tested in a real situation during the year 2012. To guarantee its properties and results, probabilistic and statistical tools are needed and we discuss the theoretical aspects that accompanied its design. In the first part of the thesis, we make a general presentation of the problem of the detection of irregularities in social contributions. In the second part, we address the detection specifically, through the data used to define and evaluate the irregularities. In particular, the only available data are sufficient to model the detection. We also present a new random forest algorithm, named "uniformly random forest", which constitutes the detection engine. In the third part, we detail the theoretical properties of uniformly random forests. In the fourth part, we present an economic point of view, when the irregularities in the social contributions have a voluntary character, this in the context of the fight against concealed work. In particular, we are interested in the link between the financial situation of firms and social security fraud. The last part is devoted to the experimental and real results of the model, which we discuss.Each chapter of the thesis can be read independently of the others and some notions are redundant in order to facilitate the exploration of the content.
  • On the Simulation of Offshore Oil Facilities at the System Level.

    Joris COSTES, Jean michel GHIDAGLIA, Philippe MUGUERRA, Keld LUND NIELSEN, Xavier RIOU, Jean philippe SAUT, Nicolas VAYATIS
    Proceedings of the 10th International Modelica Conference, March 10-12, 2014, Lund, Sweden | 2014
    Offshore oil facilities are complex systems that involve elaborate physics combined with stochastic aspects related, for instance, to failure risk or price variation. Although there exist many dedicated software tools to simulate flows typically encountered in oil exploitations, there is still no tool that combines physical (mostly engineering fluid mechanics) and risk simulation. Such a tool could be useful to engineers or decision makers for specification, design and study of offshore oil facilities. We present a first step towards the creation of such a tool. Our current simulator is based on new Modelica components to simulate fluid flows and on stochastic simulation at a higher level, for modeling risk and costs. Modelica components implement physical models for single and two-phase flows in some typical devices of an offshore field. The risk simulation uses Markov chains and statistical indicators to assess performance and resilience of the system over several months or years of operation.
  • Guest Editors' foreword.

    Nader h. BSHOUTY, Gilles STOLTZ, Nicolas VAYATIS, Thomas ZEUGMANN
    Theoretical Computer Science | 2014
    No summary available.
  • Nonparametric Markovian Learning of Triggering Kernels for Mutually Exciting and Mutually Inhibiting Multivariate Hawkes Processes.

    Remi LEMONNIER, Nicolas VAYATIS
    Lecture Notes in Computer Science | 2014
    In this paper, we address the problem of fitting multivariate Hawkes processes to potentially large-scale data in a setting where series of events are not only mutually-exciting but can also exhibit inhibitive patterns. We focus on nonparametric learning and propose a novel algorithm called MEMIP (Markovian Estimation of Mutually Interacting Processes) that makes use of polynomial approximation theory and self-concordant analysis in order to learn both triggering kernels and base intensities of events. Moreover, considering that N historical observations are available, the algorithm performs log-likelihood maximization in O(N) operations, while the complexity of non-Markovian methods is in O(N2). Numerical experiments on simulated data, as well as real-world data, show that our method enjoys improved prediction performance when compared to state-of-the art methods like MMEL and exponential kernels.
  • Tsunami amplification phenomena.

    Themistoklis STEFANAKIS, Frederic DIAS, Nicolas VAYATIS, Christian KHARIF, Costas SYNOLAKIS, Serge GUILLAS, Edward a. COX, Paolo SAMMARCO, Esteban g. TABAK
    2013
    This thesis is divided into four parts. In the first part, I will present our work on long wave run-up and resonance amplification phenomena. Using numerical simulations based on the nonlinear shallow water equations, we show that for monochromatic waves of normal incidence on a sloping beach, resonant amplification of the run-up occurs when the input wave length is 5.2 times greater than the beach length. We also show that this resonant run-up amplification can be observed from multiple wave profiles. However, the resonant run-up amplification is not limited to infinite sloping beaches. By varying the bathymetric profile, resonance is also present for piecewise linear bathymetries and for realistic bathymetries. In the second part, I present a new analytical solution to study the propagation of non-point source generated tsunamis over a constant depth using linear shallow water wave theory. The solution, based on separation of variables and a double Fourier transform in space, is accurate, easy to implement and allows the study of realistic wave shapes such as N-waves. In the third part, I study the effect of localized protrusions on the generation of long waves. Even when the final displacement is known from seismic analysis, the deforming seafloor may have relief such as mountains and faults. The effect of bathymetry on surface wave generation is studied analytically by solving the linear shallow water equations with for. We find that as the rim height increases, partial wave trapping reduces the wave height in the far field, while amplifying it above the rim. I will also briefly present a solution of the same equation forced over a cone. Finally, in the last part, we will see if small islands can protect nearby coasts from tsunamis as is widely accepted by local communities. Recent findings on the 2010 Mentawai Islands tsunami show an amplified run-up on coastal areas behind small islands, compared to the run-up on adjacent locations, which are not influenced by the presence of islands. We will investigate the conditions for this run-up amplification by numerically solving the equations in nonlinear shallow water. The experimental setup is governed by five physical parameters. The objective is twofold: Find the maximum run-up amplification with a minimum number of simulations. We present a recently developed active experimental design based on Gaussian processes, which significantly reduces the computational cost. After running two hundred simulations, we find that in none of the cases considered does the island provide protection to the coastal area behind it. On the contrary, we measured an amplification of the run-up on the beach behind it compared to a lateral position on the beach not directly affected by the presence of the island. This amplification reached a maximum factor of 1.7. Thus, small islands near the mainland act as amplifiers of the long waves in the area directly behind them and not as natural barriers as was commonly believed until now.
  • Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration.

    Emile CONTAL, David BUFFONI, Alexandre ROBICQUET, Nicolas VAYATIS
    Lecture Notes in Computer Science | 2013
    In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of sqrt{K} for fixed iteration cost over purely sequential versions. Moreover, the multiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors.
  • Ranking Forests.

    Stephan CLEMENCON, Marine DEPECKER, Nicolas VAYATIS
    Journal of Machine Learning Research | 2013
    The present paper examines how the aggregation and feature randomization principles underlying the algorithm RANDOM FOREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Clémençon and Vayatis (2009c). The present work describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called RANKING FOREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets.
  • Machine-learning for price prediction in the online tourism sector.

    Till WOHLFARTH, Stephan CLEMENCON, Francois ROUEFF, Thierry ARTIERES, Patrice BERTAIL, Fabrice ROSSI, Nicolas VAYATIS
    2013
    We are interested in the problem of predicting the occurrence of a price decrease in order to provide advice for the immediate or deferred purchase of a trip on a price comparison website. The proposed methodology is based on the statistical learning of a price evolution model from the joint information of attributes of the considered trip and past observations of its price and "popularity". The main originality consists in representing the price evolution by the inhomogeneous point process of its jumps. From a database constituted by liligo.com, we implement a learning method of a price evolution model. This model allows us to provide a predictor of the occurrence of a price drop over a given future period and thus to provide a purchase or waiting advice to the customer.
  • Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis.

    Antoine DEMATTEO, Stephan CLEMENCON, Nicolas VAYATIS, Mathilde MOUGEOT
    2013
    In the liquefied natural gas (LNG) shipping industry, the phenomenon of sloshing can lead to the occurrence of very high pressures in the tanks of the vessel. The issue of modelling or estimating the probability of the simultaneous occurrence of such extremal pressures is now crucial from the risk assessment point of view. In this paper, heavy-tail modelling, widely used as a conservative approach to risk assessment and corresponding to a worst-case risk analysis, is applied to the study of sloshing. Multivariate heavy-tailed distributions are considered, with Sloshing pressures investigated by means of small-scale replica tanks instrumented with d >1 sensors. When attempting to fit such nonparametric statistical models, one naturally faces computational issues inherent in the phenomenon of dimensionality. The primary purpose of this article is to overcome this barrier by introducing a novel methodology. For d-dimensional heavy-tailed distributions, the structure of extremal dependence is entirely characterised by the angular measure, a positive measure on the intersection of a sphere with the positive orthant in Rd. As d increases, the mutual extremal dependence between variables becomes difficult to assess. Based on a spectral clustering approach, we show here how a low dimensional approximation to the angular measure may be found. The nonparametric method proposed for model sloshing has been successfully applied to pressure data. The parsimonious representation thus obtained proves to be very convenient for the simulation of multivariate heavy-tailed distributions, allowing for the implementation of Monte-Carlo simulation schemes in estimating the probability of failure. Besides confirming its performance on artificial data, the methodology has been implemented on a real data set specifically collected for risk assessment of sloshing in the LNG shipping industry.
  • Ranking data with ordinal labels: optimality and pairwise aggregation.

    Stephan CLEMENCON, Sylvain ROBBIANO, Nicolas VAYATIS
    Machine Learning | 2013
    No summary available.
  • Stochastic tracking algorithms and empirical concentration inequalities for statistical learning.

    Thomas PEEL, Liva RALAIVOLA, Sandrine ANTHOINE, Francois DENIS, Sandrine ANTHOINE, Matthieu KOWALSKI, Eric DEBREUVE, Laurent DAUDET, Nicolas VAYATIS
    2013
    The first part of this thesis introduces new algorithms for parsimonious signal decomposition. Based on Matching Pursuit (MP) they address the following problem: how to reduce the computational time of the often very expensive MP selection step. In response, we subsample the dictionary at each iteration, in rows and columns. We show that this theoretically sound approach performs well in practice. We then propose an iterative block gradient descent algorithm for feature selection in multi-class classification. This is based on the use of error-correcting codes that transform the problem into a simultaneous parsimonious signal representation problem. The second part presents new empirical concentration inequalities of Bernstein type. First, they concern the theory of U-statistics and are used to elaborate bounds in generalization in the framework of ranking algorithms. These bounds take advantage of a variance estimator for which we propose an efficient computation algorithm. Then, we present an empirical version of the Bernstein-type inequality proposed by Freedman [1975] for martingales. Here again, the strength of our bound lies in the introduction of a variance estimator computable from the data. This allows us to propose bounds in generalization for all online learning algorithms improving the state of the art and opening the door to a new family of learning algorithms taking advantage of this empirical information.
  • Regularization methods for prediction in dynamic graphs and e-marketing applications.

    Emile RICHARD, Nicolas VAYATIS, Francis BACH, Theodoros EVGENIOU, Stephane GAIFFAS, Michael irwin JORDAN, Thibaut MUNIER, Massimiliano PONTIL, Jean philippe VERT
    2012
    The prediction of connections between objects, based either on a noisy observation or on a sequence of observations, is a problem of interest for a number of applications ranging from the design of recommendation systems in e-commerce and social networks to network inference in molecular biology. This work presents formulations of the link prediction problem, in both static and temporal settings, as a regularized problem. In the static scenario it is the combination of two well-known norms, the L1-norm and the trace-norm that allows link prediction, while in the dynamic case the use of an autoregressive model on linear descriptors allows to improve the quality of prediction. We will study the nature of the solutions of the optimization problems both in statistical and algorithmic terms. Encouraging empirical results highlight the contribution of the adopted methodology.
  • Machine learning methods for discrete multi-scale fows : application to finance.

    Nicolas MAHLER, Nicolas VAYATIS, Marc HOFFMANN, Charles albert LEHALLE, Stephan CLEMENCON, Mathieu ROSENBAUM, Liva RALAIVOLA
    2012
    This research work deals with the problem of identifying and predicting the trends of a financial series considered in a multivariate framework. The framework of this problem, inspired by machine learning, is defined in chapter I. The efficient markets hypothesis, which contradicts the objective of trend prediction, is first recalled, while the different schools of thought in market analysis, which to some extent oppose the efficient markets hypothesis, are also exposed. We explain the techniques of fundamental analysis, technical analysis and quantitative analysis, and we are particularly interested in the techniques of statistical learning allowing the calculation of predictions on time series. The difficulties of dealing with time-dependent and/or non-stationary factors are highlighted, as well as the usual pitfalls of overfitting and careless data manipulation. Extensions of the classical statistical learning framework, especially transfer learning, are presented. The main contribution of this chapter is the introduction of a research methodology allowing the development of numerical models for trend prediction. This methodology is based on an experimental protocol, consisting of four modules. The first module, entitled Data Observation and Modeling Choices, is a preliminary module devoted to the expression of modeling choices, hypotheses and very general objectives. The second module, Database Construction, transforms the target variable and explanatory variables into factors and labels in order to train numerical trend prediction models. The third module, Model Building, is aimed at building numerical trend prediction models. The fourth and final module, Backtesting and Numerical Results, evaluates the accuracy of the trend prediction models on a significant test set, using two generic backtesting procedures. The first procedure returns the recognition rates of upward and downward trends. The second procedure constructs trading rules using the predictions computed on the test set. The result (P&L) of each of the trading rules is the accumulated gains and losses during the test period. Moreover, these backtesting procedures are completed by interpretation functions, which facilitate the analysis of the decision mechanism of the numerical models. These functions can be measures of the predictive ability of the factors, or measures of the reliability of the models as well as of the delivered predictions. They contribute decisively to the formulation of hypotheses better adapted to the data, as well as to the improvement of the methods of representation and construction of databases and models. This is explained in chapter IV. The numerical models, specific to each of the model building methods described in Chapter IV, and aimed at predicting the trends of the target variables introduced in Chapter II, are indeed calculated and backtested. The reasons for switching from one model-building method to another are particularly well documented. The influence of the choice of parameters - and this at each stage of the experimental protocol - on the formulation of conclusions is also highlighted. The PPVR procedure, which does not require any additional calculation of parameters, has thus been used to reliably study the efficient markets hypothesis. New research directions for the construction of predictive models are finally proposed.
  • New Insights into Decision Trees Ensembles.

    Vincent PISETTA, Djamel abdelkader ZIGHED, Alexandre AUSSEM, Nicolas VAYATIS, Lorenza SAITTA, Antoine CORNUEJOLS, Gilbert RITSCHARD, Gilles COHEN, Fabien RICO, Nicolas VAYATIS, Lorenza SAITTA
    2012
    Tree ensembles are currently one of the most powerful statistical learning methods. However, their theoretical properties, as well as their empirical performance, are still subject to many questions. In this thesis, we propose to shed new light on these methods. More specifically, after having discussed the current theoretical aspects (chapter 1) of three main set schemes (Random Forests, Boosting and Stochastic Discrimination), we will propose an analysis tending towards the existence of a common point for the soundness of these three principles (chapter 2). This principle takes into account the importance of the first two moments of the margin in obtaining an ensemble with good performance. From this, we derive a new algorithm called OSS (Oriented Sub-Sampling) whose steps are in full agreement and follow logically from the framework we introduce. The performance of OSS is empirically superior to that of popular algorithms such as Random Forests and AdaBoost. In a third section (Chapter 3), we analyze the Random Forests method by adopting a "kernel" point of view. The latter allows to improve the understanding of the forests with, in particular, the understanding and observation of the regularization mechanism of these techniques. Adopting a kernel point of view allows us to improve Random Forests via popular post-processing methods such as SVM or multiple kernel learning. These methods show significantly better performance than the basic algorithm, and also allow for pruning the ensemble by keeping only a small part of the classifiers.
  • Active vision strategies for object recognition.

    Joseph DEFRETIN, Nicolas VAYATIS, Matthieu CORD, Jacques BLANC TALON, Stephane HERBIN, Guy LE BESNERAIS, Francois CHARPILLET, Simon LACROIX
    2011
    This thesis, realized in cooperation with ONERA, concerns the active recognition of 3D objects by an autonomous agent equipped with an observation camera. While in passive recognition the acquisition modalities of the observations are imposed and sometimes generate ambiguities, active recognition exploits the possibility to control online these acquisition modalities during a sequential inference process in order to remove the ambiguity. The objective of the work is to establish planning strategies in the acquisition of information with the concern of a realistic implementation of active recognition. The framework of statistical learning is used for this purpose. The first part of the work is devoted to learning to plan. Two realistic constraints are taken into account: on the one hand, an imperfect modeling of the objects likely to generate additional ambiguities - on the other hand, the learning budget is expensive (in time, in energy), and therefore limited. The second part of the work focuses on how to best exploit the observations during the recognition process. The possibility of an active multi-scale recognition is studied to allow an interpretation as early as possible in the sequential process of information acquisition. Observations are also used to estimate the pose of the object in a robust way in order to ensure consistency between the planned modalities and those actually reached by the visual agent.
  • Some variable selection issues around the Lasso estimator.

    Mohamed HEBIRI, Nicolas VAYATIS
    2009
    The general problem studied in this thesis is that of linear regression in high dimension. We are particularly interested in estimation methods that capture the sparsity of the target parameter, even when the dimension is greater than the number of observations. A popular method for estimating the unknown parameter of the regression in this context is the least squares estimator penalized by the S\ell_1S norm of the coefficients, known as the lasso. The contributions of this thesis focus on the study of variants of the lasso taking into account either additional information on the input variables or semi-supervised modes of data acquisition. More precisely, the issues addressed in this work are: i) the estimation of the unknown parameter when the space of explanatory variables has a well determined structure (presence of correlations, order structure on the variables or groupings between variables). ii) the construction of estimators adapted to the transductive framework, for which the new unlabeled observations are taken into account. These adaptations are partly deduced by a modification of the penalty in the Definition of the lasso estimator. The introduced procedures are essentially analyzed from a non-asymptotic point of view. In particular, we prove that the estimators verify oracle sparsity inequalities. Consistency results in variable selection are also established. The practical performances of the studied methods are also illustrated through simulation results.
  • Statistical Approaches in Learning Theory: boosting and ranking.

    Nicolas VAYATIS
    2006
    Statistical Learning Theory has been growing rapidly the last ten years. The introduction of efficient classification algorithms, such as boosting and Support Vector Machines, coping with high-dimensional data, generated new questions that Vapnik-Chervonenkis (VC) theory could not answer. The Empirical Risk Minimization principle does not account for practical learning algorithms and the VC dimension is not the appropriate concept to explain the generalization ability of such methods. In the first chapter, we recall the interpretations of boosting algorithms as implementations of convex risk minimization principles and we study their properties under this viewpoint. In particular, we show the importance of regularization in order to obtain consistent strategies. We also develop a new class of algorithms called the Mirror Averaging Algorithm and we evaluate their performance through simulation experiments. After presenting the fundamental ideas underlying boosting, we study, in the second chapter, more advanced issues such as oracle inequalities. Thus, we propose some fine calibration of the penalty function according to the cost function being used and present non-asymptotic results on the performance of penalized boosting estimators, with refinements such as fast rates of convergence under Mammen-Tsybakov margin conditions. We also describe the approximation properties of boosting using decision stumps. The third chapter explores the ranking problem. In applications such as information retrieval or credit scoring, ranking the instances can be much more significant than simply classifying them. We propose a simple formulation of this problem in which ranking is equivalent to classification with pairs of observations. The difference lies in the nature of the empirical risks which take the form of U-statistics and we develop classification theory in order to fit with this framework. We also investigate the possibilities of generalizing the ranking error in order to include priors on the ranking we are aiming at, for instance, when we want to focus only on the "best" instances.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr