The State Key Laboratory of Processors (hereinafter “the Laboratory”) has made significant progress in research on CPU Silent Execution Errors (SCEE), proposing the first low-overhead online detection method for data centers. The results were published at SOSP 2025, a flagship conference in operating systems, and presented at the ACM SIGOPS Strategic Workshop 2025.
As feature sizes continue to shrink and manufacturing complexity increases, processor reliability issues have become increasingly prominent. In today’s large-scale CPU data center environments, processors are exhibiting a new class of faults— Silent CPU Execution Errors (SCEE)—that can stealthily corrupt application control flow and data without being detected by existing fault-tolerance mechanisms, posing severe threats to system security and data integrity. Industry leaders including Google, Amazon AWS, and Alibaba have reported numerous CPU faults. These errors can lead to Silent Data Corruption (SDC) and, in severe cases, silently compromise user data, causing critical consequences.
To address these challenges, the Laboratory developed the first online detection system specifically targeting SDC. The related work is described in the paper “Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation.” The first author is PhD student Chenxiao Liu, advised by Huimin Cui, Zidong Du, and Chenxi Wang. Within the overarching framework of building a semantics-aware cloud system software that spans programming languages, runtimes, and operating systems, the paper achieves efficient detection of silent errors with very low runtime overhead (approximately 2%–6%), significantly improving the reliability of data center services. The key insight of Orthrus is that cloud application code typically separates into a Control Path and a Data Path. The Control Path handles scheduling and dispatching logic and does not directly manipulate user data, whereas the Data Path performs concrete operations on user data. Orthrus therefore adopts a hybrid strategy: checksum-based verification for the Control Path and re-execution method for the Data Path, enabling efficient error detection. To support this mechanism, Orthrus introduces a series of innovations across the compiler, system, and runtime layers (see Figure 1).

Figure 1. Design framework of the Orthrus online SCEE verification system
At the SIGOPS Strategic Workshop, the talk “The Core Problem with Cores: It’s All About the Software” further examined the newly observed SCEE in data centers based on Orthrus, arguing that SCEE is a critical issue in both high-performance computing and cloud environments, and advocating that software-based methods can be used to provide low-overhead, high-efficiency detection.
The 31st ACM Symposium on Operating Systems Principles (SOSP 2025) was held in Seoul, South Korea, from October 13 to 16, 2025. SOSP is one of the two most prestigious international conferences in operating systems, with an acceptance rate of approximately 17.7% this year. Organized by ACM SIGOPS, the conference brings together experts from academia and industry to showcase innovative research and practical experience spanning operating system design, implementation, analysis, evaluation, and deployment. SOSP emphasizes novelty and practicality, fostering deep exchanges and integration between theory and engineering across operating systems and related fields.
The inaugural SOSP Strategic Workshop 2025 focused on the profound transformations in operating systems and computing over the past decade driven by hardware evolution, distributed systems, AI-driven automation, and the widespread adoption of heterogeneous computing. Building on the SOSP 2015 History Day, the workshop combined historical insights with forward-looking perspectives, convening Turing Award laureates, senior scholars, foundational contributors, and emerging researchers from multiple countries and regions to chart a strategic research roadmap for the next five years. The workshop was co-located with the SOSP 2025 main conference in Seoul.
downloadFile