ExM

From NetSysLab

Jump to: navigation, search

ExM: System support for extreme-scale, many-task applications

Project objectives: To achieve the technical advances required to execute many-task applications efficiently, reliably, and easily on petascale and exascale computers, and thus to open up exascale computing to new problem-solving methods and application classes.

Project description: Exascale computers will enable and demand new problem solving methods that involve many concurrent and interacting tasks. Methodologies such as rational design, uncertainty quantification, parameter estimation, and inverse modeling all have this many-task property. All will frequently have aggregate computing needs that require exascale computers. Running many-task applications efficiently, reliably, and easily on extreme-scale computers is challenging. System software designed for today’s mainstream single program multiple data (SPMD) computations is not necessarily a good match to the demands of many-task applications.

To address these demands, the ExM project will design, develop, and evaluate two new system software components. The ExM data store will allow concurrent and asynchronous application tasks to communicate efficiently and reliably, both with each other and with persistent storage, by reading and writing data objects maintained in node-local storage, including memory, SSD, and local disk. The ExM task manager will allow for the rapid, data-aware, and efficient dispatch of many tasks to large exascale computing systems and for the fault-tolerant execution of those tasks. These components will be efficiently integrated with current and future extreme-scale system software and made available to developers via both a parallel scripting language and APIs.

In proposing these ideas, we build on the results of extensive initial experiments with this approach that have demonstrated the value of specialized many-task system services for applications as diverse as climate modeling, protein folding, rational material design, genomics, proteomics, image processing, and computational economics, many of which have been scaled to 165,000 computing cores.

Our integrated components will be tested on the largest systems being developed over the next three years, using a variety of applications that are essential to DOE’s goals.

Anticipated impact: The project will produce advances in computer science and software technology that enable the efficient and reliable use of exascale computers for new classes of applications. In this way, the project will both accelerate access to exascale computers by important existing applications and facilitate uptake of large-scale parallel computing by other application communities for which it is currently out of reach. The project will also produce students and postdocs expert in innovative system software for extreme-scale computers.

Related links: International Exascale Software Project; DoE ASCR

Publications

[1] All publications are listed on ExM website
[0] Many-Task Computing Tools for Multiscale Modeling, Daniel S. Katz, Matei Ripeanu, Michael Wilde, Technical report arXiv:1110.0404v1