Efficient placement design and storage cost saving for big data workflow in cloud datacenters.

Authors
  • IKKEN Sonia
  • RENAULT Eric
  • VEQUE Veronique
  • KHEDDOUCI Hamamache
  • MOKDAD Nadia lynda
  • SENS Pierre
  • KECHADI Tahar
  • KHEDDOUCI Hamamache
  • MOKDAD Nadia lynda
Publication date
2017
Publication type
Thesis
Summary Workflows are typical systems dealing with big data. These systems are deployed in geo-distributed locations to leverage existing cloud infrastructures and perform large-scale experiments. The data generated by such experiments is huge and stored in multiple locations for reuse. Indeed, workflow systems are composed of collaborative tasks, presenting new requirements in terms of dependency and intermediate data exchange for their processing. This leads to new problems in selecting distributed data and storage resources so that task or job execution is timely and resource utilization is cost-effective. Therefore, this thesis addresses the problem of managing data hosted in cloud data centers by considering the requirements of the workflow systems that generate them. To this end, the first problem addressed in this thesis deals with the intermediate data access behavior of tasks that are executed in a MapReduce-Hadoop cluster. This approach develops and explores the Markov model that uses the spatial location of blocks and analyzes the sequentiality of spill files through a prediction model. Second, this thesis addresses the problem of placing intermediate data in federated cloud storage by minimizing the storage cost. Through federation mechanisms, we propose an exact ILP algorithm to support multiple cloud data centers hosting dependency data by considering each pair of files. Finally, a more generic problem is addressed involving two variants of the placement problem related to divisible and integer dependencies. The main objective is to minimize the operational cost based on the requirements of inter and intra-job dependencies.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr