Optimization of checkpointing and execution model for an implementation of OpenMP on distributed memory architectures.

Authors

TRAN Van long
RENAULT Eric
LAMOTTE Jean luc
MORIN Christine
HA Viet hai
BARTHOU Denis
FOUCHAL Hacene

Publication date

2018

Publication type

Thesis

Summary OpenMP and MPI have become the standard tools for developing parallel programs on a shared memory and distributed memory architecture respectively. Compared to MPI, OpenMP is easier to use. This is due to the fact that OpenMP automatically generates the parallel code and synchronizes the results using directives, clauses and execution functions, while MPI requires programmers to do this work manually. As a result, efforts have been made to port OpenMP to distributed memory architectures. However, excluding CAPE, no solution satisfies both of the following requirements: 1) to be fully compliant with the OpenMP standard and 2) to have high performance. CAPE (Checkpointing-Aided Parallel Execution) is a framework that automatically translates and provides execution functions to run an OpenMP program on a distributed memory architecture based on checkpointing techniques. In order to execute an OpenMP program on a distributed memory system, CAPE uses a set of templates to translate the OpenMP source code into CAPE source code, and then the CAPE source code is compiled by a conventional C/C++ compiler. Basically, the idea of CAPE is that the program first runs on a set of nodes in the system, with each node functioning as a process. Whenever the program encounters a parallel section, the master distributes tasks to the slave processes using discontinuous incremental checkpoints (DICKPT). After sending the checkpoints, the master waits for the results returned by the slaves. The next step at the master level is to receive and merge the results of the checkpoints before injecting them into its memory. The slave nodes receive the various checkpoints and then inject them into their memory to perform the assigned work. The result is then sent back to the master using DICKPT. At the end of the parallel region, the master sends the checkpoint result to each slave to synchronize the program memory space. In some experiments, CAPE has shown high performance on distributed memory systems and is a viable solution fully compatible with OpenMP. However, CAPE is still in the development phase, as its checkpoints and execution model need to be optimized to improve performance, capacity and reliability. This thesis aims to present the proposed approaches to optimize and improve the capacity of checkpoints, design and implement a new execution model, and improve the capacity of CAPE. First, we proposed an arithmetic on checkpoints that models their data structure and its operations. This modeling helps to optimize their size and reduce the time required for merging, while improving their capacity. Second, we developed TICKPT (Time-Stamp Incremental Checkpointing), an implementation of checkpoint arithmetic. TICKPT is an improvement of DICKPT, it added time-stamp to the checkpoints to identify their order. Analysis and comparative experiments show that TICKPT is not only smaller, but also has less impact on program performance. Third, we designed and implemented a new execution model and prototypes for CAPE based on TICKPT. The new execution model allows CAPE to use resources efficiently, avoid the risk of bottlenecks and satisfy the requirement of Bernstein's conditions. In the end, these approaches significantly improve CAPE's performance, capabilities and reliability. The data sharing implemented on CAPE and based on arithmetic on checkpoints is open and based on TICKPT. This also demonstrates the right direction we have taken and makes CAPE more complete.

See the publication

Topics of the publication

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr