Least squares estimation of a discrete density under k-monotonicity constraints and risk bounds. Application to the estimation of the number of species in a population.

Authors
  • GIGUELAY Jade
  • GIRAUD Christophe
  • MASSART Pascal
  • GIRAUD Christophe
  • MASSART Pascal
  • BALABDAOUI Fadoua
  • HUET Sylvie
  • DUROT Cecile
  • LAURENT Beatrice
  • BALABDAOUI Fadoua
  • BUNGE John
Publication date
2017
Publication type
Thesis
Summary This thesis is a contribution to the field of non-parametric estimation under shape constraints. The functions are discrete and the form considered, called k-monotonicity, where k denotes an integer greater than 2, is a generalization of convexity. The integer k is an indicator of the degree of hollowness of a convex function. The manuscript is structured in three parts in addition to the introduction, the conclusion and an appendix.Introduction:The introduction includes three chapters. The first one presents a state of the art of density estimation under shape constraints. The second one is a synthesis of the results obtained during the thesis, available in French and in English. Finally, Chapter 3 gathers some notations and mathematical results used during the manuscript.Part I: Estimation of a discrete density under k-monotonicity constraintTwo least squares estimators of a discrete distribution p* under k-monotonicity constraint are proposed. Their characterization is based on the spline decomposition of k-monotonic sequences, and on the properties of their primitives. The statistical properties of these estimators are studied. Their estimation quality, in particular, is assessed. It is measured in terms of squared error, the two estimators converge at the parametric speed. An algorithm derived from the Support Reduction Algorithm is implemented and available in the R-package pkmon. A study on simulated data sets illustrates the properties of these estimators. This work has been published in Electronic Journal of Statistics (Giguelay, 2017).Part II: Risk Bound ComputationsIn the first chapter of Part II, the quadratic risk of the previously introduced least squares estimator is bounded. This bound is adaptive in the sense that it depends on a trade-off between the distance of p* from the boundary of the set of finitely supported k-monotone densities, and the complexity (in terms of decomposition in the spline basis) of the densities belonging to this set that are sufficiently close to p*. The method is based on a variational risk formulation proposed by Chatterjee (2014) andgeneralized to the density estimation framework. Subsequently, the bracketed entropies of the corresponding function spaces are computed to control the supremum of empirical processes involved in the squared error. The optimality of the risk bound is then discussed with respect to the results obtained in the continuous case and in the regression framework.In the second chapter of Part II, additional results on the bracketed entropies for k-monotone function spaces are given.Part III: Estimation of the number of species in a population and k-monotonicity testsThe last part deals with the problem of the estimation of the number of species in a population. The chosen model is that of an abundance distribution common to all species and defined as a mixture. The proposed method is based on the assumption of k-monotonicity of abundance. This hypothesis makes the problem of estimating the number of species identifiable. Two approaches are proposed. The first one is based on the least squares estimator under k-monotonicity constraint, while the second one is based on the empirical estimator. The two estimators are compared on a study on simulated data. Since the estimate of the number of species is strongly dependent on the degree of k-monotonicity chosen in the model, three multiple testing procedures are then proposed to infer the degree k directly on the basis of the observations. The level and power of these procedures are calculated and then evaluated by means of a study on simulated data sets and the method is applied on real data sets from the literature.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr