Statistical learning for event sequences using point processes.

Authors
Publication date
2017
Publication type
Thesis
Summary The goal of this thesis is to show that the arsenal of new optimization methods allows us to solve difficult estimation problems based on event models.These dated events are ordered chronologically and therefore cannot be considered as independent.This simple fact justifies the use of a particular mathematical tool called point process to learn a certain structure from these events. The first is the point process behind the Cox proportional hazards model: its conditional strength allows to define the hazard ratio, a fundamental quantity in the survival analysis literature.The Cox regression model relates the time to the occurrence of an event, called a failure, to the covariates of an individual.This model can be reformulated using the point process framework. The second is the Hawkes process which models the impact of past events on the probability of future events.The multivariate case allows to encode a notion of causality between the different dimensions considered.This theme is divided into three parts.The first part is concerned with a new optimization algorithm that we have developed.It allows to estimate the parameter vector of the Cox regression when the number of observations is very large.Our algorithm is based on the SVRG (Stochastic Variance Reduced Gradient) algorithm and uses an MCMC (Monte Carlo Marker Model) method.We have proved convergence speeds for our algorithm and have shown its numerical performance on simulated and real-world data sets.The second part shows that causality in the Hawkes sense can be reduced to a minimum. The second part shows that the causality in the Hawkes sense can be estimated in a non-parametric way thanks to the integrated cumulants of the multivariate point process.We have developed two methods for estimating the integrals of the kernels of the Hawkes process, without making any assumption on the shape of these kernels. Our methods are faster and more robust, with respect to the shape of the kernels, compared to the state of the art. We have demonstrated the statistical consistency of the first method, and have shown that the second one can be applied to a convex optimization problem.The last part highlights the order book dynamics using the first non-parametric estimation method introduced in the previous part.We have used data from the EUREX futures market, defined new order book models (e.g., the order book of the same day), and developed a new method for the estimation of the order book.We have used data from the EUREX futures market, developed new order book models (based on the previous work of Bacry et al.) and applied the estimation method on these point processes.The results obtained are very satisfactory and consistent with an economic analysis.Such a work proves that the method we have developed allows to extract a structure from data as complex as those from high-frequency finance.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr