SSL Research
Research in the SSL currently focuses in three main areas, with a variety of subprojects in each area.
- Scalable System Software research focuses on research, design, and implementation of operating systems, runtime systems, libraries, and system services that enable complex, large scale applications. Research in this area over the years has resulting in operating systems and communication services that form the basis of modern HPC operating systems.
- Resilience research focuses on solving the challenges that failures in truly extreme-scale systems present. This includes research into the sources of failure, evaluating the performance of techniques to tolerate these failures, and development of new systems to mitigate the impact of failure.
- Software Infrastructure research focuses on developing the support systems needed to develop next-generation computing systems. This includes, for example, techniques to scalably disseminate information in large-scale systems, enabling job launch and debugging on tens of thousands of nodes. These techniques also novel approaches to adaptation and similary autonomic behavior in systems of the largest scale.