MPI Skeletons
In order to prepare current and future parallel systems to run large parallel applications, we need a way to characterize what the applications require of these systems. The DOE-funded MPI Skeletons project focuses on building tools to help developers derive application “skeletons.”
Skeletons are miniature versions of full applications with reduced functionality that can be used to study specific performance dimensions of an application. These skeletons can be then used for performance analysis as well as input to parallel machine simulators used in the machine design process.
Running complete applications to test performance is not practical due to their resource requirements and the amount of time it takes to run them, especially considering that designers work with simulations of machines, which are many times slower than the machines themselves. This project investigated code analysis and transformation techniques to reduce an application to its essential “skeleton,” letting researches work with smaller programs that retain the essential structure of the original. The skeletons can run quickly in simulation environments used in the machine design process and allow researchers to answer questions not previously possible during this phase.
In high performance computing, parallelism is achieved through a combination of on-node parallelism (shared memory threads or SIMD vector operations) as well as cross-node parallelism in the form of message passing. Understanding the communication behavior of applications is important when designing expensive and energy hungry components of the system that make up the interconnection network connecting compute nodes.
This project initially studied how program slicing techniques could be used to reduce an application to a skeleton that retains the message passing patterns of the original, while eliminating as much of the computational work performed by the program in between messaging operations. Subsequent skeletonization work focused on deriving skeletons relative to other performance-critical aspects of the system, such as memory traversal and update patterns. This project was built on top of the ROSE compiler infrastructure funded by the US Department of Energy.