HOFFMANN Marc

< Back to ILB Patrimony
Topics of productions
Affiliations
  • 2020 - 2021
    Modelling and analysis for médical and biological applications
  • 2012 - 2021
    Centre de recherches en mathématiques de la décision
  • 1995 - 1996
    Université Paris Diderot
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2010
  • 2007
  • 2005
  • 1996
  • Individual and population approaches for calibrating division rates in population dynamics: Application to the bacterial cell cycle.

    Marie DOUMIC, Marc HOFFMANN
    2021
    Modelling, analysing and inferring triggering mechanisms in population reproduction is fundamental in many biological applications. It is also an active and growing research domain in mathematical biology. In this chapter, we review the main results developed over the last decade for the estimation of the division rate in growing and dividing populations in a steady environment. These methods combine tools borrowed from PDE's and stochastic processes, with a certain view that emerges from mathematical statistics. A focus on the application to the bacterial cell division cycle provides a concrete presentation, and may help the reader to identify major new challenges in the field.
  • Modeling and optimal strategies in short-term energy markets.

    Laura TINSI, Peter TANKOV, Arnak DALALYAN, Gilles PAGES, Peter TANKOV, Arnak DALALYAN, Gilles PAGES, Almut e. d. VERAART, Huyen PHAM, Olivier FERON, Marc HOFFMANN, Almut e. d. VERAART, Huyen PHAM
    2021
    This thesis aims at providing theoretical tools to support the development and management of intermittent renewable energies in short-term electricity markets.In the first part, we develop an exploitable equilibrium model for price formation in intraday electricity markets. To this end, we propose a non-cooperative game between several generators interacting in the market and facing intermittent renewable generation. Using game theory and stochastic control theory, we derive explicit optimal strategies for these generators and a closed-form equilibrium price for different information structures and player characteristics. Our model is able to reproduce and explain the main stylized facts of the intraday market such as the specific time dependence of volatility and the correlation between price and renewable generation forecasts.In the second part, we study dynamic probabilistic forecasts as diffusion processes. We propose several stochastic differential equation models to capture the dynamic evolution of the uncertainty associated with a forecast, derive the associated predictive densities and calibrate the model on real weather data. We then apply it to the problem of a wind producer receiving sequential updates of probabilistic wind speed forecasts, which are used to predict its production, and make buying or selling decisions on the market. We show to what extent this method can be advantageous compared to the use of point forecasts in decision-making processes. Finally, in the last part, we propose to study the properties of aggregated shallow neural networks. We explore the PAC-Bayesian framework as an alternative to the classical empirical risk minimization approach. We focus on Gaussian priors and derive non-asymptotic risk bounds for aggregate neural networks. This analysis also provides a theoretical basis for parameter tuning and offers new perspectives for applications of aggregate neural networks to practical high-dimensional problems, which are increasingly present in energy-related decision processes involving renewable generation or storage.
  • Statistical modeling and analysis of Internet latency traffic data.

    Alexis FREMOND, Marc HOFFMANN, Gerard BIAU, Marc HOFFMANN, Gerard BIAU, Mathieu ROSENBAUM, Arnak s. DALALYAN, Vincent RIVOIRARD, Mathieu ROSENBAUM, Arnak s. DALALYAN
    2020
    The speed of information exchange in the Internet network is measured using latency: a time that measures the time elapsed between the sending of the first bit of information of a request and the reception of the first bit of information of the response. In this thesis realized in collaboration with Citrix, we are interested in the study and modeling of latency data in a context of Internet traffic optimization. Citrix collects data through two different channels, generating latency measures suspected to share common properties. In a first step, we address a distributional fitting problem where the co-variates and the responses are probability measures imaged from each other by a deterministic transport, and the observables are independent samples drawn according to these laws. We propose an estimator of this transport and show its convergence properties. We show that our estimator can be used to match the distributions of the latency measures generated by the two channels.In a second step we propose a modeling strategy to predict the process obtained by computing the moving median of the latency measures on regular partitions of the interval [0, T] with a mesh size D > 0. We show that the conditional mean of this process, which plays a major role in Internet traffic optimization, is correctly described by a Fourier series decomposition and that its conditional variance is organized in clusters that we model using an ARMA Seasonal-GARCH process, i.e., an ARMA-GARCH process with added deterministic seasonal terms. The predictive performance of this model is compared to the reference models used in the industry. A new measure of the amount of residual information not captured by the model based on a certain entropy criterion is introduced.We then address the problem of fault detection in the Internet network. We propose an algorithm for detecting changes in the distribution of a stream of latency data based on the comparison of two sliding windows using a certain weighted Wasserstein distance.Finally, we describe how to select the training data of predictive algorithms in order to reduce their size to limit the computational cost without impacting the accuracy.
  • Estimating fast mean-reverting jumps in electricity market models.

    Thomas DESCHATRE, Marc HOFFMANN, Olivier FERON
    ESAIM: Probability and Statistics | 2020
    Based on empirical evidence of fast mean-reverting spikes, electricity spot prices are often modeled X + Zβ as the sum of a continuous Itô semimartingale X and a mean-reverting compound Poisson process Ztβ=∫0t ∫ℝxe−β(t−s)p̲(ds,dt) where p̲(ds,dt) is Poisson random measure with intensity λds ⊗dt. In a first part, we investigate the estimation of (λ, β) from discrete observations and establish asymptotic efficiency in various asymptotic settings. In a second part, we discuss the use of our inference results for correcting the value of forward contracts on electricity markets in presence of spikes. We implement our method on real data in the French, German and Australian market over 2015 and 2016 and show in particular the effect of spike modelling on the valuation of certain strip options. In particular, we show that some out-of-the-money options have a significant value if we incorporate spikes in our modelling, while having a value close to 0 otherwise.
  • Some aspects of the central role of financial market microstructure : Volatility dynamics, optimal trading and market design.

    Paul JUSSELIN, Mathieu ROSENBAUM, Nicole EL KAROUI, Mathieu ROSENBAUM, Jean philippe BOUCHAUD, Darrell DUFFIE, Gilles PAGES, Peter TANKOV, Marc HOFFMANN, Nizar TOUZI, Jean philippe BOUCHAUD, Darrell DUFFIE
    2020
    This thesis is organized in three parts. The first part examines the relationship between microscopic and macroscopic market dynamics by focusing on the properties of volatility. In the second part, we focus on the stochastic optimal control of point processes. Finally, in the third part, we study two market design problems. We start this thesis by studying the links between the no-arbitrage principle and the volatility irregularity. Using a scaling method, we show that we can effectively connect these two notions by analyzing the market impact of metaorders. More precisely, we model the market order flow using linear Hawkes processes. We then show that the no-arbitrage principle and the existence of a non-trivial market impact imply that volatility is rough and more precisely that it follows a rough Heston model. We then examine a class of microscopic models where the order flow is a quadratic Hawkes process. The objective is to extend the rough Heston model to continuous models allowing to reproduce the Zumbach effect. Finally, we use one of these models, the quadratic rough Heston model, for the joint calibration of the SPX and VIX volatility slicks. Motivated by the intensive use of point processes in the first part, we are interested in the stochastic control of point processes in the second part. Our objective is to provide theoretical results for applications in finance. We start by considering the case of Hawkes process control. We prove the existence of a solution and then propose a method to apply this control in practice. We then examine the scaling limits of stochastic control problems in the context of population dynamics models. More precisely, we consider a sequence of models of discrete population dynamics which converge to a model for a continuous population. For each model we consider a control problem. We prove that the sequence of optimal controls associated to the discrete models converges to the optimal control associated to the continuous model. This result is based on the continuity, with respect to different parameters, of the solution of a backward-looking schostatic differential equation.In the last part we consider two market design problems. First, we examine the question of the organization of a liquid derivatives market. Focusing on an options market, we propose a two-step method that can be easily applied in practice. The first step is to select the options that will be listed on the market. For this purpose, we use a quantization algorithm that allows us to select the options most in demand by investors. We then propose a pricing incentive method to encourage market makers to offer attractive prices. We formalize this problem as a principal-agent problem that we solve explicitly. Finally, we find the optimal duration of an auction for markets organized in sequential auctions, the case of zero duration corresponding to the case of a continuous double auction. We use a model where the market takers are in competition and we consider that the optimal duration is the one corresponding to the most efficient price discovery process. After proving the existence of a Nash equilibrium for the competition between market takers, we apply our results on market data. For most assets, the optimal duration is between 2 and 10 minutes.
  • Contributions to high dimensional statistics.

    Olga KLOPP, Patrice BERTAIL, Gerard BIAU, Stephane BOUCHERON, Marc HOFFMANN, Olivier GUEDON, Guillaume LECUE, Alexandre b. TSYBAKOV
    2019
    The purpose of this thesis is to give an account of my contributions to high dimensional statistics. The first part is devoted to the problem of matrix completion. After presenting the problem, I describe the main results obtained in the papers [Klo11, GK17, KLMS15, Klo15, KLT16, KT15, LKMS14]. The second part is devoted to the variable coefficients model . I present the main results of the non-asymptotic studies [KP13, KP15]. Finally, the third part presents the results of [KTV16] concerning the parsimonious network model and the graphon model.
  • Statistical inference for a partially observed interacting system of Hawkes processes.

    Chenguang LIU, Nicolas FOURNIER, Sylvain DELATTRE, Marc HOFFMANN, Ismael CASTILLO, Emmanuelle CLEMENT, Vincent RIVOIRARD
    2019
    We observe the actions of a K subsample of N individuals, during a time interval of length t>0, for some large K≤N. We model the individuals' relationships by i.i.d. Bernoulli (p) random variables, where p∈(0,1] is an unknown parameter. The action rate of each individual depends on an unknown parameter μ>0 and on the sum of some function ϕ of the ages of the actions of the individuals that influence it. The function ϕ is unknown but we assume that it decays quickly. The goal of this thesis is to estimate the parameter p, which is the main feature of the interaction graph, in the asymptotic where population size N→∞, the observed population size K→∞, and in a long time t→∞. Let mt be the average number of actions per individual up to time t, which depends on all model parameters. In the subcritical case, where mt increases linearly, we construct an estimator of p with convergence rate 1K√+NmtK√+NKmt√. In the supercritical case, where mt increases exponentially fast, we construct an estimator of p with convergence rate 1K√+NmtK√. In a second step, we study the asymptotic normality of these estimators. In the subcritical case, the work is very technical but quite general, and we are led to study three possible regimes, depending on the dominant term in 1K√+NmtK√+NKmt√ at 0. In the supercritical case, we unfortunately assume some additional conditions and consider only one of the two possible regimes.
  • Statistical estimation in a randomly structured branching population.

    Marc HOFFMANN, Aline MARGUET
    2019
    We consider a binary branching process structured by a stochastic trait that evolves according to a diffusion process that triggers the branching events, in the spirit of Kimmel's model of cell division with parasite infection. Based on the observation of the trait at birth of the first n generations of the process, we construct nonparametric estimator of the transition of the associated bifurcating chain and study the parametric estimation of the branching rate. In the limit $n → ∞$, we obtain asymptotic efficiency in the parametric case and minimax optimality in the nonparametric case.
  • Statistical estimation in a randomly structured branching population.

    Marc HOFFMANN, Aline MARGUET
    Stochastic Processes and their Applications | 2019
    We consider a binary branching process structured by a stochastic trait that evolves according to a diffusion process that triggers the branching events, in the spirit of Kimmel's model of cell division with parasite infection. Based on the observation of the trait at birth of the first n generations of the process, we construct nonparametric estimator of the transition of the associated bifurcating chain and study the parametric estimation of the branching rate. In the limit $n → ∞$, we obtain asymptotic efficiency in the parametric case and minimax optimality in the nonparametric case.
  • Optimal Quantization : Limit Theorem, Clustering and Simulation of the McKean-Vlasov Equation.

    Yating LIU, Gilles PAGES, Marc HOFFMANN, Gerard BIAU, Francois BOLLEY, Jean francois CHASSAGNEUX, Clementine PRIEUR, Benjamin JOURDAIN, Harald LUSCHGY
    2019
    This thesis contains two parts. In the first part, we prove two limit theorems of optimal quantization. The first limit theorem is the characterization of the convergence under the Wasserstein distance of a sequence of probability measures by the simple convergence of the quantization error functions. These results are established in Rd and also in a separable Hilbert space. The second limit theorem shows the speed of convergence of the optimal grids and the quantization performance for a sequence of probability measures which converge under the Wasserstein distance, in particular the empirical measure. The second part of this thesis focuses on the approximation and simulation of the McKean-Vlasov equation. We start this part by proving, by Feyel's method (see Bouleau (1988) [Section 7]), the existence and uniqueness of a strong solution of the McKean-Vlasov equation dXt = b(t, Xt, μt)dt + σ(t, Xt, μt)dBt under the condition that the coefficient functions b and σ are lipschitzian. Then, the convergence speed of the theoretical Euler scheme of the McKean-Vlasov equation is established and also the convex order functional results for the McKean-Vlasov equations with b(t,x,μ) = αx+β, α,β ∈ R. In the last chapter, the error of the particle method, several quantization-based schemes and a hybrid particle-quantization scheme are analyzed. At the end, two example simulations are illustrated: the Burgers equation (Bossy and Talay (1997)) in dimension 1 and the FitzHugh-Nagumo neural network (Baladron et al. (2012)) in dimension 3.
  • Testing for high frequency features in a noisy signal.

    Mathieu MEZACHE, Marc HOFFMANN, Human REZAEI, Marie DOUMIC
    2019
    The aim of this article is to detect high frequency (HF) features in a noisy signal. We propose a parametric characterization in the Fourier domain of the HF features. Then we introduce a procedure to evaluate these parameters and compute a p-value which assesses in a quantitative manner the presence or absence of such features, that we also call "oscillations". The procedure is well adapted for real 1-dimensional signals. If the signal analyzed has singular events in the low frequencies, the first step is a data-driven regularization of its Fourier transform. In the second step, the HF features parameters are estimated. The third step is the computation of the p-value thanks to a Monte Carlo procedure. The test is conducted on sanity-check signals where the ratio amplitude of the oscillations/level of the noise is entirely controlled. The test detects HF features even when the level of the noise is five times larger than the amplitude of the oscillations. The test is also conducted on signals from Prion disease experiments and confirms the presence of HF features in these signals.
  • Modeling and analysis of cell population dynamics : application to the early development of ovarian follicles.

    Frederique ROBIN, Frederique CLEMENT, Romain YVINEC, Marie DOUMIC, Nicolas CHAMPAGNAT, Pierre GABRIEL, Beatrice LAROCHE, Marc HOFFMANN, Jan HASENAUER
    2019
    This thesis aims at designing and analyzing population dynamics models dedicated to the dynamics of somatic cells during the early stages of ovarian follicle growth. The behavior of the models is analyzed by theoretical and numerical approaches, and the parameter values are calibrated by proposing maximum likelihood strategies adapted to our specific dataset. A non-linear stochastic model, which takes into account the joint dynamics between two cell types (precursor and proliferative), is dedicated to the activation of follicular growth. A rigorous finite state projection approach is used to characterize the state of the system at extinction and to calculate the extinction time of the precursor cells. A multi-type linear age-structured model, applied to the proliferative cell population, is dedicated to early follicular growth. The different types correspond here to the spatial positions of the cells. This model is decomposable and the transitions are unidirectional from the first to the last type. We prove the convergence in long time of the stochastic Bellman-Harris model and of the McKendrick-VonFoerster multi-type equation. We adapt existing results to the case where the Perron-Frobenius theorem does not apply, and we obtain explicit analytical formulas for the asymptotic moments of the cell numbers and the stationary age distribution. We also study the well-posedness of the inverse problem associated with the deterministic model.
  • Efficient volatility estimation in a two‐factor model.

    Olivier FERON, Pierre GRUET, Marc HOFFMANN
    Scandinavian Journal of Statistics | 2019
    We statistically analyse a multivariate HJM diffusion model with stochastic volatility. The volatility process of the first factor is left totally unspecified while the volatility of the second factor is the product of an unknown process and an exponential function of time to maturity. This exponential term includes some real parameter measuring the rate of increase of the second factor as time goes to maturity. From historical data, we efficiently estimate the time to maturity parameter in the sense of constructing an estimator that achieves an optimal information bound in a semiparametric setting. We also identify nonparametrically the paths of the volatility processes and achieve minimax bounds. We address the problem of degeneracy that occurs when the dimension of the process is greater than two, and give in particular optimal limit theorems under suitable regularity assumptions on the drift process. We consistently analyse the numerical behaviour of our estimators on simulated and real datasets of prices of forward contracts on electricity markets. Mathematics Subject Classification (2010): 62M86, 60J75, 60G35, 60F05.
  • Estimating fast mean-reverting jumps in electricity market models.

    Thomas DESCHATRE, Marc HOFFMANN
    2018
    Based on empirical evidence of fast mean-reverting spikes, we model electricity price processes X + Z β as the sum of a continuous Itô semimartingale X and a a mean-reverting compound Poisson process Z β t = t 0 R xe −β(t−s) p(ds, dt) where p(ds, dt) is Poisson random measure with intensity λds ⊗ dt. In a first part, we investigate the estimation of (λ, β) from discrete observations and establish asymptotic efficiency in various asymptotic settings. In a second part, we discuss the use of our inference results for correcting the value of forward contracts on electricity markets in presence of spikes. We implement our method on real data in the French, Greman and Australian market over 2015 and 2016 and show in particular the effect of spike modelling on the valuation of certain strip options. In particular, we show that some out-of-the-money options have a significant value if we incorporate spikes in our modelling, while having a value close to 0 otherwise. Mathematics Subject Classification (2010): 62M86, 60J75, 60G35, 60F05.
  • Efficient volatility estimation in a two-factor model.

    Olivier FERON, Marc HOFFMANN, Pierre GRUET
    2018
    We statistically analyse a multivariate HJM diffusion model with stochastic volatility. The volatility process of the first factor is left totally unspecified while the volatility of the second factor is the product of an unknown process and an exponential function of time to maturity. This exponential term includes some real parameter measuring the rate of increase of the second factor as time goes to maturity. From historical data, we efficiently estimate the time to maturity parameter in the sense of constructing an estimator that achieves an optimal information bound in a semiparametric setting. We also identify nonparametrically the paths of the volatility processes and achieve minimax bounds. We address the problem of degeneracy that occurs when the dimension of the process is greater than two, and give in particular optimal limit theorems under suitable regularity assumptions on the drift process. We consistently analyse the numerical behaviour of our estimators on simulated and real datasets of prices of forward contracts on electricity markets. Mathematics Subject Classification (2010): 62M86, 60J75, 60G35, 60F05.
  • Dependence modeling between continuous time stochastic processes : an application to electricity markets modeling and risk management.

    Thomas DESCHATRE, Marc HOFFMANN, Jean david FERMANIAN, Marc HOFFMANN, Jean david FERMANIAN, Peter TANKOV, Markus BIBINGER, Vincent RIVOIRARD, Olivier FERON, Peter TANKOV, Markus BIBINGER
    2017
    This thesis deals with dependence problems between stochastic processes in continuous time. In a first part, new copulas are established to model the dependence between two Brownian movements and to control the distribution of their difference. It is shown that the class of admissible copulas for Brownians contains asymmetric copulas. With these copulas, the survival function of the difference of the two Brownians is higher in its positive part than with a Gaussian dependence. The results are applied to the joint modeling of electricity prices and other energy commodities. In a second part, we consider a discretely observed stochastic process defined by the sum of a continuous semi-martingale and a compound Poisson process with mean reversion. An estimation procedure for the mean-reverting parameter is proposed when the mean-reverting parameter is large in a high frequency finite horizon statistical framework. In a third part, we consider a doubly stochastic Poisson process whose stochastic intensity is a function of a continuous semi-martingale. To estimate this function, a local polynomial estimator is used and a window selection method is proposed leading to an oracle inequality. A test is proposed to determine if the intensity function belongs to a certain parametric family. With these results, the dependence between the intensity of electricity price peaks and exogenous factors such as wind generation is modeled.
  • Adaptive estimation for bifurcating Markov chains.

    Simeon valere BITSEKI PENDA, Marc HOFFMANN, Adelaide OLIVIER, S. valere BITSEKI PENDA
    Bernoulli | 2017
    In a first part, we prove Bernstein-type deviation inequalities for bifurcating Markov chains (BMC) under a geometric ergodicity assumption, completing former results of Guyon and Bitseki Penda, Djellout and Guillin. These preliminary results are the key ingredient to implement nonparametric wavelet thresholding estimation procedures: in a second part, we construct nonparametric estimators of the transition density of a BMC, of its mean transition density and of the corresponding invariant density, and show smoothness adaptation over various multivariate Besov classes under $L^p$ -loss error, for $1\leq p<\infty$. We prove that our estimators are (nearly) optimal in a minimax sense. As an application, we obtain new results for the estimation of the splitting size-dependent rate of growth-fragmentation models and we extend the statistical study of bifurcating autoregressive processes.
  • Novel approaches to multivariate GARCH models in high dimension.

    Benjamin POIGNARD, Jean david FERMANIAN, Jean michel ZAKOIAN, Jean david FERMANIAN, Jean michel ZAKOIAN, Pierre ALQUIER, Ostap OKHRIN, Marc HOFFMANN, Cristina BUTUCEA, Pierre ALQUIER, Ostap OKHRIN
    2017
    This paper deals with the high dimensionality problem in multivariate GARCH processes. The author proposes a new vine-GARCH dynamics for correlation processes parameterized by an undirected graph called "vine". This approach directly generates definite-positive matrices and encourages parsimony. After establishing existence and uniqueness results for stationary solutions of the vine-GARCH model, the author analyzes the asymptotic properties of the model. He then proposes a general framework of penalized M-estimators for dependent processes and focuses on the asymptotic properties of the adaptive Sparse Group Lasso estimator. The high dimension is treated by considering the case where the number of parameters diverges with the sample size. The asymptotic results are illustrated by simulated experiments. Finally in this framework the author proposes to generate the sparsity for dynamics of variance-covariance matrices. To do so, the class of multivariate ARCH models is used and the corresponding processes are estimated by penalized ordinary least squares.
  • Nonparametric estimation of the division rate of an age dependent branching process.

    Marc HOFFMANN, Adelaide OLIVIER
    Stochastic Processes and their Applications | 2016
    We study the nonparametric estimation of the branching rate B(x) of a supercritical Bellman-Harris population: a particle with age x has a random lifetime governed by B(x). at its death time, it gives rise to k ≥ 2 children with lifetimes governed by the same division rate and so on. We observe in continuous time the process over [0, T ]. Asymptotics are taken as T → ∞. the data are stochastically dependent and one has to face simultaneously censoring, bias selection and non-ancillarity of the number of observations. In this setting, under appropriate ergodicity properties, we construct a kernel-based estimator of B(x) that achieves the rate of convergence exp(−λ_B β/(2β+1) T), where λ_B is the Malthus parameter and β > 0 is the smoothness of the function B(x) in a vicinity of x. We prove that this rate is optimal in a minimax sense and we relate it explicitly to classical nonparametric models such as density estimation observed on an appropriate (parameter dependent) scale. We also shed some light on the fact that estimation with kernel estimators based on data alive at time T only is not sufficient to obtain optimal rates of convergence, a phenomenon which is specific to nonparametric estimation and that has been observed in other related growth-fragmentation models.
  • Adaptive estimation for bifurcating markov chains.

    Simeon valere BITSEKI PENDA, Marc HOFFMANN, Adelaide OLIVIER
    Bernoulli | 2016
    In a first part, we prove Bernstein-type deviation inequalities for bifurcating Markov chains (BMC) under a geometric ergodicity assumption, completing former results of Guyon and Bitseki Penda, Djellout and Guillin. These preliminary results are the key ingredient to implement nonparametric wavelet thresholding estimation procedures: in a second part, we construct nonparametric estimators of the transition density of a BMC, of its mean transition density and of the corresponding invariant density, and show smoothness adaptation over various multivariate Besov classes under L p-loss error, for 1 ≤ p < ∞. We prove that our estimators are (nearly) optimal in a minimax sense. As an application, we obtain new results for the estimation of the splitting size-dependent rate of growth-fragmentation models and we extend the statistical study of bifurcating autoregressive processes.
  • Application of stochastic processes to real-time auctions and information propagation in social networks.

    Remi LEMONNIER, Nicolas VAYATIS, Nicolas VAYATIS, Manuel GOMEZ RODRIGUEZ, Florent KRZAKALA, Marc HOFFMANN, Emmanuel BACRY, Manuel GOMEZ RODRIGUEZ, Florent KRZAKALA, Marc HOFFMANN
    2016
    In this thesis, we study two applications of stochastic processes to Internet marketing. The first chapter focuses on the scoring of Internet users for real-time auctions. This problem consists in finding the probability that a given Internet user performs an action of interest, called conversion, within a few days after the display of an advertising banner. We show that Hawkes processes are a natural model of this phenomenon but that state-of-the-art algorithms are not applicable to the size of data typically used in industrial applications. We therefore develop two new non-parametric inference algorithms that are several orders of magnitude faster than previous methods. We show empirically that the first one performs better than the state-of-the-art competitors, and that the second one can be applied to even larger datasets without paying too high a price in terms of predictive power. The resulting algorithms have been implemented with very good performances for several years at 1000 mercy, the leading marketing agency being the industrial partner of this CIFRE thesis, where they have become an important production asset. The second chapter focuses on diffusive processes on graphs which are an important tool to model the propagation of a viral marketing operation on social networks. We establish the first theoretical bounds on the total number of nodes reached by a contagion under any graph and diffusion dynamics, and show the existence of two distinct regimes: the sub-critical regime where at most $O(sqrt{n})$ nodes will be infected, where $n$ is the size of the network, and the over-critical regime where $O(n)$ nodes can be infected. We also study the behavior with respect to the observation time $T$ and highlight the existence of critical times below which a diffusion, even an over-critical one in the long run, behaves in a sub-critical way. Finally, we extend our work to percolation and epidemiology, where we improve existing results.
  • Nonparametric estimation of the division rate of an age dependent branching process.

    Marc HOFFMANN, Adelaide OLIVIER
    2015
    We study the nonparametric estimation of the branching rate B(x) of a supercritical Bellman-Harris population: a particle with age x has a random lifetime governed by B(x). at its death time, it gives rise to k ≥ 2 offsprings with lifetime governed by the same division rate and so on. We observe continuously the process over a large time interval [0, T ]. the data are stochastically dependent and one has to face simultaneously censoring, bias selection and non-ancillarity of the number of observations. In this setting, we construct a kernel-based estimator of B(x) that achieves the rate of convergence exp(−λ B β 2β+1 T), where λ B is the Malthus parameter and β > 0 is the smoothness of the function B(x) in a viscinity of x. We prove that this rate is optimal in a minimax sense and we relate it explicitly to classical nonparametric models such as density estimation observed on an appropriate (parameter dependent) scale. We also shed some light on the fact that estimation with kernel estimators based on data alive at time T only is not sufficient to obtain optimal rates of convergence, a phenomenon which is specific to nonparametric estimation and that has been observed in other related growth-fragmentation models.
  • Statistical analysis of growth-fragmentation models.

    Adelaide OLIVIER, Marc HOFFMANN, Marie DOUMIC, Benoit PERTHAME, Marie DOUMIC, Benoit PERTHAME, Eva LOCHERBACH, Patricia REYNAUD BOURET, Stephane MISCHLER, Alexandre b. TSYBAKOV, Christophe GIRAUD, Eva LOCHERBACH, Patricia REYNAUD BOURET
    2015
    This theoretical study is closely linked to a field of application: it consists in modeling the growth of a population of cells which divide according to an unknown rate of division, depending on a so-called structuring variable - age and cell size being the two paradigmatic examples studied. The related mathematical field is at the interface of process statistics, non-parametric estimation and partial differential equation analysis. The three objectives of this work are: to reconstruct the division rate (as a function of age or size) for different observation schemes (in genealogical time or in continuous time) . to study the transmission of a general biological trait from one cell to another and to study the trait of a typical cell . to compare the growth of different populations of cells through the Malthus parameter (after introducing variability in the growth rate for example).
  • Statistical estimation of a growth-fragmentation model observed on a genealogical tree.

    Marie DOUMIC, Marc HOFFMANN, Nathalie KRELL, Lydia ROBERT
    Bernoulli | 2015
    We raise the issue of estimating the division rate for a growing and dividing population modelled by a piecewise deterministic Markov branching tree. Such models have broad applications, ranging from TCP/IP window size protocol to bacterial growth. Here, the individ-uals split into two offsprings at a division rate B(x) that depends on their size x, whereas their size grow exponentially in time, at a rate that exhibits variability. The mean empirical measure of the model satisfies a growth-fragmentation type equation, and we bridge the determinis-tic and probabilistic viewpoints. We then construct a nonparametric estimator of the division rate B(x) based on the observation of the pop-ulation over different sampling schemes of size n on the genealogical tree. Our estimator nearly achieves the rate n −s/(2s+1) in squared-loss error asymptotically, generalizing and improving on the rate n −s/(2s+3) obtained in [13, 15] through indirect observation schemes. Our method is consistently tested numerically and implemented on Escherichia coli data, which demonstrates its major interest for practical applications.
  • Optimization and statistical methods for high frequency finance.

    Marc HOFFMANN, Mauricio LABADIE, Charles albert LEHALLE, Gilles PAGES, Huyen PHAM, Mathieu ROSENBAUM
    ESAIM: Proceedings and Surveys | 2014
    High Frequency finance has recently evolved from statistical modeling and analysis of financial data – where the initial goal was to reproduce stylized facts and develop appropriate inference tools – toward trading optimization, where an agent seeks to execute an order (or a series of orders) in a stochastic environment that may react to the trading algorithm of the agent (market impact, invoentory). This context poses new scientific challenges addressed by the minisymposium OPSTAHF.
  • Division in Escherichia coliis triggered by a size-sensing rather than a timing mechanism.

    Lydia ROBERT, Marc HOFFMANN, Nathalie KRELL, Stephane AYMERICH, Jerome ROBERT, Marie DOUMIC
    BMC Biology | 2014
    Background Many organisms coordinate cell growth and division through size control mechanisms: cells must reach a critical size to trigger a cell cycle event. Bacterial division is often assumed to be controlled in this way, but experimental evidence to support this assumption is still lacking. Theoretical arguments show that size control is required to maintain size homeostasis in the case of exponential growth of individual cells. Nevertheless, if the growth law deviates slightly from exponential for very small cells, homeostasis can be maintained with a simple 'timer' triggering division. Therefore, deciding whether division control in bacteria relies on a 'timer' or 'sizer' mechanism requires quantitative comparisons between models and data. Results The timer and sizer hypotheses find a natural expression in models based on partial differential equations. Here we test these models with recent data on single-cell growth of Escherichia coli. We demonstrate that a size-independent timer mechanism for division control, though theoretically possible, is quantitatively incompatible with the data and extremely sensitive to slight variations in the growth law. In contrast, a sizer model is robust and fits the data well. In addition, we tested the effect of variability in individual growth rates and noise in septum positioning and found that size control is robust to this phenotypic noise. Conclusions Confrontations between cell cycle models and data usually suffer from a lack of high-quality data and suitable statistical estimation techniques. Here we overcome these limitations by using high precision measurements of tens of thousands of single bacterial cells combined with recent statistical inference methods to estimate the division rate within the models. We therefore provide the first precise quantitative assessment of different cell cycle models.
  • Modeling and statistical analysis of price formation across scales, Market impact.

    Relu adrian IUGA, Marc HOFFMANN, Damien LAMBERTON, Marc HOFFMANN, Emmanuel BACRY, Romuald ELIE, Fabrizio LILLO, Francois ROUEFF
    2014
    The development of organized electronic markets puts constant pressure on academic research in finance. The price impact of a stock market transaction involving a large quantity of shares over a short period of time is a central topic. Controlling and monitoring the price impact is of great interest to practitioners, and its modeling has thus become a central focus of quantitative finance research. Historically, stochastic calculus has gradually been imposed in finance, under the implicit assumption that asset prices satisfy diffusive dynamics. But these hypotheses do not hold at the level of "price formation", i.e. when one considers the fine scales of market participants. New mathematical techniques derived from the statistics of point processes are therefore progressively imposed. The observables (processed price, middle price) appear as events taking place on a discrete network, the order book, and this at very short time scales (a few tens of milliseconds). The approach of prices seen as Brownian diffusions satisfying equilibrium conditions becomes rather a macroscopic description of complex phenomena arising from price formation. In the first chapter, we review the properties of electronic markets. We recall the limitations of diffusive models and introduce Hawkes processes. In particular, we review the research on the maket impact and present the progress of this thesis. In a second part, we introduce a new continuous time and discrete space impact model using Hawkes processes. We show that this model takes into account the microstructure of markets and is able to reproduce recent empirical results such as the concavity of the temporary impact. In the third chapter, we study the impact of a large volume of action on the price formation process at the daily scale and at a larger scale (several days after the execution). Furthermore, we use our model to highlight new stylized facts discovered in our database. In a fourth part, we focus on a non-parametric estimation method for a one-dimensional Hawkes process. This method relies on the link between the self-covariance function and the kernel of the Hawkes process. In particular, we study the performance of this estimator in the direction of the squared error on Sobolev spaces and on a certain class containing "very" smooth functions.
  • Some limit theorems for Hawkes processes and application to financial statistics.

    E. BACRY, S. DELATTRE, M. HOFFMANN, Jean francois MUZY, J.f. MUZY
    Stochastic Processes and their Applications | 2013
    Abstract In the context of statistics for random processes, we prove a law of large numbers and a functional central limit theorem for multivariate Hawkes processes observed over a time interval [ 0 , T ] when T ? ? . We further exhibit the asymptotic behaviour of the covariation of the increments of the components of a multivariate Hawkes process, when the observations are imposed by a discrete scheme with mesh ? over [ 0 , T ] up to some further time shift ? . The behaviour of this functional depends on the relative size of ? and ? with respect to T and enables to give a full account of the second-order structure. As an application, we develop our results in the context of financial statistics. We introduced in Bacry et al. (2013) [7] a microscopic stochastic model for the variations of a multivariate financial asset, based on Hawkes processes and that is confined to live on a tick grid. We derive and characterise the exact macroscopic diffusion limit of this model and show in particular its ability to reproduce the important empirical stylised fact such as the Epps effect and the lead?lag effect. Moreover, our approach enables to track these effects across scales in rigorous mathematical terms.
  • Modelling microstructure noise with mutually exciting point processes.

    Emmanuel BACRY, Sylvain DELATTRE, Marc HOFFMANN, Jean francois MUZY
    Quantitative Finance | 2013
    We introduce a new stochastic model for the variations of asset prices at the tick-by-tick level in dimension 1 (for a single asset) and 2 (for a pair of assets). The construction is based on marked point pro- cesses and relies on linear self and mutually exciting stochastic inten- sities as introduced by Hawkes. We associate a counting process with the positive and negative jumps of an asset price. By coupling suitably the stochastic intensities of upward and downward changes of prices for several assets simultaneously, we can reproduce microstructure noise (i.e. strong microscopic mean reversion at the level of seconds to a few minutes) and the Epps effect (i.e. the decorrelation of the increments in microscopic scales) while preserving a standard Brownian diffusion behaviour on large scales. More effectively, we obtain analytical closed-form formulae for the mean signature plot and the correlation of two price increments that enable to track across scales the effect of the mean-reversion up to the diffusive limit of the model. We show that the theoretical results are consistent with empirical fits on futures Euro-Bund and Euro-Bobl in several situations.
  • Modelling microstructure noise with mutually exciting point processes.

    Emmanuel BACRY, Sylvain DELATTRE, Marc HOFFMANN, Jean francois MUZY
    Quantitative Finance | 2013
    No summary available.
  • Statistical inference across scales.

    Celine DUVAL, Marc HOFFMANN, Dominique PICARD, Marc HOFFMANN, Cristina BUTUCEA, Alexandre TSYBAKOV, Fabienne COMTE, Peter SPREIJ
    2012
    This thesis deals with the cross-scale estimation problem for a stochastic process. We study how the choice of the sampling step impacts the statistical procedures. We are interested in the estimation of jump processes from the observation of a discretized trajectory on [0, T]. When the length of the observation interval T goes to infinity, the sampling step tends either to 0 (microscopic scale), to a positive constant (intermediate scale) or to infinity (macroscopic scale). In each of these regimes we assume that the number of observations tends to infinity. First, the particular case of a compound Poisson process of unknown intensity with symmetric jumps {-1,1} is studied. Chapter 2 illustrates the notion of statistical estimation in the three scales defined above. In this model, we are interested in the properties of statistical experiments. We show the property of Local Asymptotic Normality in the three microscopic, intermediate and macroscopic scales. The Fisher information is then known for each of these regimes. Then we analyze how an intensity estimation procedure that is efficient (minimum variance) at a given scale behaves when applied to observations from a different scale. We look at the estimator of the empirical quadratic variation, which is efficient in the macroscopic regime, and we use it on data coming from the intermediate or microscopic regimes. This estimator remains efficient in the microscopic scales, but shows a substantial loss of information at intermediate scales. A unified estimation procedure is proposed, which is efficient in all regimes. Chapters 3 and 4 study the nonparametric estimation of the jump density of a compound renewal process in the microscopic regimes, when the sampling step tends to 0. An estimator of this density using wavelet methods is constructed. It is adaptive and minimax for sampling steps that decrease in T^{-alpha}, for alpha>0. The estimation procedure relies on the inversion of the composition operator giving the law of increments as a nonlinear transformation of the law of jumps that one seeks to estimate. The inverse operator is explicit in the case of the compound Poisson process (Chapter 3), but has no analytical expression for compound renewal processes (Chapter 4). In the latter case, it is approximated via a fixed point technique. Chapter 5 studies the problem of loss of identifiability in macroscopic regimes. If a jump process is observed with a large sampling step, some boundary approximations, such as the Gaussian approximation, become valid. This can lead to a loss of identifiability of the law that generated the process, when its structure is more complex than the one studied in Chapter 2. In a first step, a toy model with two parameters is considered. Two different regimes emerge from the study: a regime where the parameter is no longer identifiable and one where it remains identifiable but where the optimal estimators converge with slower speeds than the usual parametric speeds. From the particular case study, we derive lower bounds showing that there is no convergent estimator for pure jump Lévy processes or for compound renewal processes in macroscopic regimes where the sampling step grows faster than the root of T. Finally we identify macroscopic regimes where the increments of a compound Poisson process are indistinguishable from Gaussian random variables, and regimes where there is no convergent estimator for compound Poisson processes depending on too many parameters.
  • Machine learning methods for discrete multi-scale fows : application to finance.

    Nicolas MAHLER, Nicolas VAYATIS, Marc HOFFMANN, Charles albert LEHALLE, Stephan CLEMENCON, Mathieu ROSENBAUM, Liva RALAIVOLA
    2012
    This research work deals with the problem of identifying and predicting the trends of a financial series considered in a multivariate framework. The framework of this problem, inspired by machine learning, is defined in chapter I. The efficient markets hypothesis, which contradicts the objective of trend prediction, is first recalled, while the different schools of thought in market analysis, which to some extent oppose the efficient markets hypothesis, are also exposed. We explain the techniques of fundamental analysis, technical analysis and quantitative analysis, and we are particularly interested in the techniques of statistical learning allowing the calculation of predictions on time series. The difficulties of dealing with time-dependent and/or non-stationary factors are highlighted, as well as the usual pitfalls of overfitting and careless data manipulation. Extensions of the classical statistical learning framework, especially transfer learning, are presented. The main contribution of this chapter is the introduction of a research methodology allowing the development of numerical models for trend prediction. This methodology is based on an experimental protocol, consisting of four modules. The first module, entitled Data Observation and Modeling Choices, is a preliminary module devoted to the expression of modeling choices, hypotheses and very general objectives. The second module, Database Construction, transforms the target variable and explanatory variables into factors and labels in order to train numerical trend prediction models. The third module, Model Building, is aimed at building numerical trend prediction models. The fourth and final module, Backtesting and Numerical Results, evaluates the accuracy of the trend prediction models on a significant test set, using two generic backtesting procedures. The first procedure returns the recognition rates of upward and downward trends. The second procedure constructs trading rules using the predictions computed on the test set. The result (P&L) of each of the trading rules is the accumulated gains and losses during the test period. Moreover, these backtesting procedures are completed by interpretation functions, which facilitate the analysis of the decision mechanism of the numerical models. These functions can be measures of the predictive ability of the factors, or measures of the reliability of the models as well as of the delivered predictions. They contribute decisively to the formulation of hypotheses better adapted to the data, as well as to the improvement of the methods of representation and construction of databases and models. This is explained in chapter IV. The numerical models, specific to each of the model building methods described in Chapter IV, and aimed at predicting the trends of the target variables introduced in Chapter II, are indeed calculated and backtested. The reasons for switching from one model-building method to another are particularly well documented. The influence of the choice of parameters - and this at each stage of the experimental protocol - on the formulation of conclusions is also highlighted. The PPVR procedure, which does not require any additional calculation of parameters, has thus been used to reliably study the efficient markets hypothesis. New research directions for the construction of predictive models are finally proposed.
  • Statistical analysis of multifractal random walk processes.

    Laurent DUVERNET, Marc HOFFMANN, Stephane JAFFARD, Marc HOFFMANN, Emmanuel BACRY, Vincent VARGAS, Julien BARRAL, Carenne LUDENA
    2010
    We study some properties of a class of real continuous-time random processes, the multifractal random walks. A remarkable feature of these processes is their self-similarity property: the law of the small-scale process is identical to the large-scale one with a multiplicative random factor independent of the process. The first part of the thesis is devoted to the question of the convergence of the empirical moment of the process increase in a rather general asymptotic, where the step of the increase can tend to zero at the same time as the observation horizon tends to infinity. The second part proposes a family of non-parametric tests that distinguish between multifractal and semi-martingale Itô random walks. After showing the consistency of these tests, we study their behavior on simulated data. In the third part, we construct an asymmetric multifractal random walk process such that the past growth is negatively correlated with the square of the future growth. This type of leverage effect is notably observed on stock prices and financial indices. We compare the empirical properties of the process obtained with real data. The fourth part concerns the estimation of the parameters of the process. We start by showing that under certain conditions, two of the three parameters cannot be estimated. We then study the theoretical and empirical performances of different estimators of the third parameter, the intermittency coefficient, in a Gaussian case.
  • Study of some statistical estimation problems in finance.

    Mathieu ROSENBAUM, Marc HOFFMANN
    2007
    This thesis deals with several statistical finance problems and consists of four parts. In the first part, we study the question of estimating the persistence of volatility from discrete observations of a diffusion model over an interval [0,T], where T is a fixed objective time. For this purpose, we introduce a fractional Brownian motion of Hurst index H in the volatility dynamics. We construct an estimation procedure of the parameter H from the high frequency data of the diffusion. We show that the precision of our estimator is n^{-1/(4H+2)}, where n is the observation frequency and we prove its optimality in the minimax sense. These theoretical considerations are followed by a numerical study on simulated and financial data. The second part of the thesis deals with the problem of microstructure noise. For this, we consider observations at frequency n and with rounding error α_n tending to zero, of a diffusion model over an interval [0,T], where T is a fixed objective time. In this framework, we propose estimators of the integrated volatility of the asset whose precision is shown to be max(α_n, n^{-1/2}). We also obtain central limit theorems in the case of homogeneous diffusions. This theoretical study is also followed by a numerical study on simulated and financial data. In the third part of this thesis, we establish a simple characterization of Besov spaces and we use it to prove new regularity properties for some stochastic processes. This part may seem disconnected from the problems of statistical finance but it has been inspiring for part 4 of the thesis. In the last part of the thesis, a new microstructure noise index is constructed and studied on financial data. This index, whose calculation is based on the p-variations of the considered asset at different time scales, can be interpreted in terms of Besov spaces. Compared to other indices, it seems to have several advantages. In particular, it allows to highlight original phenomena such as a certain form of additional regularity in the finest scales. It is shown that these phenomena can be partially reproduced by additive microstructure noise or diffusion models with rounding error. Nevertheless, a faithful reproduction seems to require either a combination of two forms of error or a sophisticated form of rounding error.
  • Nonparametric regression and spatially inhomogeneous information.

    Stephane GAIFFAS, Marc HOFFMANN
    2005
    No summary available.
  • Adaptive methods for non-parametric estimation of the coefficients of a diffusion.

    Marc HOFFMANN, Dominique PICARD
    1996
    We study the problem of non-parametric estimation of the coefficients of a one-dimensional diffusion for a discrete observation of the trajectory in the framework of the minimax theory. Two asymptotics are mainly considered: diffusions observed over a fixed time interval (the diffusion coefficient is then estimated, whether it depends on time or space) and stationary diffusions over a time interval that increases with the number of observations (the drift coefficient and the diffusion coefficient are estimated simultaneously). The minimax velocities are calculated when the unknown parameter is subject to a besov constraint. The method is based on an approximation of the diffusion models by regression schemes, and allows the implementation of the wavelet coefficient thresholding techniques used by donoho, johnstone, kerkyacharian and picard for density or regression models.
Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr