Multiple imputation for mixed data by factor analysis.

Authors
  • AUDIGIER Vincent
  • HUSSON Francois
  • JOSSE Julie
  • RESCHE RIGON Matthieu
Publication date
2019
Publication type
Proceedings Article
Summary Taking into account an ever-increasing amount of data makes their analysis increasingly complex. This complexity translates in particular into variables of different types, the presence of missing data, and a large number of variables and/or observations. The application of statistical methods in this context is generally delicate. The purpose of this presentation is to propose a new multiple imputation method based on mixed data factor analysis (MFFA). AFDM is a factorial analysis method adapted for data sets containing quantitative and qualitative variables, whose number may or may not exceed the number of observations. By virtue of its properties, the development of a multiple imputation method based on AFDM allows inference on incomplete quantitative and qualitative variables, in large and small dimensions. The proposed multiple imputation method uses a bootstrap approach to reflect the uncertainty on the principal components and eigenvectors of the AFDM, used here to predict (impute) the data. Each bootstrap replication then provides a prediction for the incomplete data set in the dataset. These predictions are then noised to reflect the distribution of the data. We thus obtain as many imputed tables as there are bootstrap replications. After recalling the principles of multiple imputation, we will present our methodology. The proposed method will be evaluated by simulation and compared to reference methods: sequential imputation by generalized linear model, imputation by mixture model and by general location model. The proposed method allows to obtain unbiased point estimates of different parameters of interest as well as confidence intervals at the expected recovery rate. Moreover, it can be applied to data sets of various nature and dimensions, allowing to treat cases where the number of observations is smaller than the number of variables. Abstract.
Topics of the publication
  • ...
  • No themes identified
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr