Time:10:30-12:00,October 25th
Place: Meeting Hall, 4th Floor, ICT, CAS
Speaker:Elmootazbellah (Mootaz) Elnozahy, Senior Manager, IBM Austin
This talk will review the current problems of reliability in large scale HPC systems of today, and projects the problems going into the Exascasle era. We argue that the current practices have reached the limits of their usefulness and will not be adequate for Exascale systems. We then argue that the solution will be brought about with a concerted effort by the programming model, hardware architecture, and system software. Additionally, we argue that realistic expectations of performance and resource utilization will be an important component of any future solutions, and point out that the lack of realistic expectations in existing systems has led to many flawed decisions on system procurement, choice of software and general disappointment with existing HPC systems.
Mootaz is an IBM Master Inventor and an IEEE Fellow working as a Senior Manager at IBM Research in Austin, Texas. He obtained a Ph.D. degree in Computer Science from Rice University. From 1993 until 1997, he was on the faculty at CMU. Since 1997, he has been with the IBM Austin Research Lab, where he started the Systems Software Department. While at IBM, he has worked on code compression for PowerPC (1997), cc-NUMA systems for x-86 platforms (1999), acceleration of the Web site performance for the Census bureau (2000), dense low-power servers (2001), blade-based servers (2002). From 2002 until 2007, he led IBM's entry in the DARPA HPCS Initiative, a project code-named PERCS. He continues to consult widely on reliable computing issues. He has also been a consultant for the NSF, DARPA and the state of Texas.