Optimization of memory management on distributed machine.

Authors
Publication date
2012
Publication type
Thesis
Summary In order to exploit the capabilities of parallel architectures such as clusters, grids, multi-processor systems, and more recently clouds and multi-core systems, a universal and easy-to-use programming language remains to be developed. From a programmer's point of view, OpenMP is very easy to use largely because of its ability to support incremental parallelization, the ability to dynamically define the number of execution threads, and also because of its scheduling strategies. However, because it was originally designed for shared memory systems, OpenMP is generally very limited for performing computations on distributed memory systems. Many solutions have been tried to run OpenMP on distributed memory systems. The most successful approaches focus on exploiting a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array, and therefore have difficulties in providing a fully compliant, high performance implementation of OpenMP. CAPE - for Checkpointing Aided Parallel Execution - is an alternative solution for developing a compliant OpenMP implementation for distributed memory systems. The idea is the following: when arriving at a parallel section, the image of the master thread is saved and sent to the slaves. Then, each slave executes one of the threads. At the end of the parallel section, each slave thread extracts a list of any modifications that have been made locally and sends it back to the master thread. In order to prove the feasibility of this approach, the first version of CAPE was implemented using full recovery points. However, a preliminary analysis showed that the large amount of data transmitted between threads and the extraction of the list of changes from full resume points leads to poor performance. Moreover, this version is limited to parallel problems satisfying Bernstein's conditions, i.e., it does not allow for shared data. The objective of this thesis is to propose new approaches to improve the performance of CAPE and overcome the restrictions on shared data. First, we developed DICKPT (Discontinuous Incremental ChecKPoinTing), an incremental checkpointing technique that supports the ability to take discontinuous checkpoints during the execution of a process. Based on DICKPT, the execution speed of the new version of CAPE has been increased considerably. For example, the time to calculate a large matrix-matrix multiplication on a cluster of desktops has become very similar to the runtime of an optimized MPI program. Furthermore, the speedup associated with this new version for various numbers of threads is quite linear for different problem sizes. For shared data, we proposed UHLRC (Updated Home-based Lazy Relaxed Consistency), a modified version of the HLRC (Home-based Lazy Relaxed Consistency) memory model, to make it more suitable for CAPE characteristics. Prototypes and algorithms to implement data synchronization and shared data directives and clauses are also specified. These two works guarantee the possibility for CAPE to respect OpenMP shared data requests.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr