Research Progress

Multiple Papers from the "Qimeng" Processor Chip Fully Automated Design System have been Accepted by NeurIPS25 and AAAI26

Date： Dec 05, 2025 | Size: 【 A A A 】 | 【Print】

The "Qimeng" processor chip fully automated design system is dedicated to achieving fully automated design of both hardware and software for processor chips through artificial intelligence (AI) technology. This system enables automation across multiple key stages of processor chip design, including automated CPU front-end design, automated HDL code generation, automated compiler design, automated high-performance library generation, and automated operating system configuration optimization. It was initially proposed and has been continuously advanced by the team of Yunji Chen and Qi Guo from the State Key Laboratory of Processors. Four papers from "Qimeng" processor chip fully automated design system have been accepted by NeurIPS 2025 (The 39th Annual Conference on Neural Information Processing Systems, a CCF-A conference). Additionally, two papers have been accepted by AAAI 2026 (The 40th Annual AAAI Conference on Artificial Intelligence, also a CCF-A conference). These research achievements cover areas including automated HDL code generation, automated compiler design, and automated high-performance library generation.

Project homepage of QiMeng:https://qimeng-ict.github.io/

1. QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

This paper has been accepted by NeurIPS 2025. The first author is Yaoyu Zhu, an assistant professor at the laboratory.

QiMeng-CodeV-R1 addresses the problem of automatically generating Verilog code from natural language (NL) by the reinforcement learning with verifiable reward (RLVR) approach. Existing RLVR methods typically face three key challenges when applied to electronic design automation (EDA): the lack of automated and accurate verification environments, the scarcity of high-quality NL–code pairs, and the prohibitive computation cost of RLVR. The proposed CodeV-R1 method integrates ideas of automated verification, data synthesis, and efficient training to design a comprehensive framework that includes a testbench generator, round-trip data synthesis, and a two-stage training pipeline. Specifically, the method first uses rules to generate testbenches for equivalence checking, then synthesizes and filters to produce high-quality datasets, and finally employs a "distill-then-RL" training pipeline to reduce training costs. Experimental results show that the CodeV-R1-7B model achieves pass@1 performance of 68.6% and 72.9% on the VerilogEval v2 and RTLLM v1.1 benchmarks, respectively, representing an improvement of 12%–20% over previous state-of-the-art methods. It even surpasses the 671B DeepSeek-R1 model on RTLLM.

Paper link:https://arxiv.org/pdf/2505.24183

Project homepage: https://github.com/IPRC-DIP/CodeV-R1

2. QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

This paper has been accepted by NeurIPS 2025. The first author is the PhD student Yang Zhang, supervised by Associate Researcher Rui Zhang and Senior Engineer Jiaming Guo.

QiMeng-SALV addresses the problem of automatic HDL code generation by proposing a fine-grained, signal-level reinforcement learning optimization method. Existing approaches often struggle to obtain effective functional rewards during reinforcement learning due to the lack of high-quality HDL training data. The signal-aware learning method proposed in this work extracts functionally correct code snippets from partially incorrect modules through signal-aware verification and AST analysis, shifting reinforcement learning optimization from the module level to the signal level. This provides more detailed and effective functional reward signals for reinforcement learning, achieving improved training performance. Experimental results show that QiMeng-SALV outperforms all open-source non-inference models on the VerilogEval and RTLLM benchmarks, achieving state-of-the-art performance. It reaches a pass@1 score of 62.0% on the RTLLM v2.0 benchmark, representing a 10.9% improvement over traditional module-level methods with our proposed signal-level reinforcement learning approach, and achieves performance comparable to the 671B-parameter DeepSeek-V3 using only 7B parameters.

Paper link: https://arxiv.org/abs/2510.19296

Project homepage: https://zy1xxx.github.io/SALV/

3. QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

This paper has been accepted by NeurIPS 2025. The first author is master’s student Changxin Ke, and the supervising advisor is Associate Researcher Rui Zhang.

QiMeng-MuPa addresses the functional-equivalence challenge in automatic code parallelization and proposes a mutual-supervised learning method. Existing automatic translation methods typically face data scarcity and functional inequivalence: the former limits the model’s generalization capability, and the latter makes it difficult to guarantee the executable correctness of translated code when using generated data. QiMeng-MuPa constructs a dual-model mutual-supervision closed loop of a Translator and a Tester, allowing them to mutually generate data and reinforce each other in a loop of Co-verify and Co-evolve: the Tester is responsible for generating unit tests to filter and verify equivalent code to evolve the Translator, while the Translator generates high-quality translated code as augmented data to drive the Tester’s evolution. Experimental results show that QiMeng-MuPa can raise the pass@1 of Qwen2.5-Coder-7B by 28.91%, achieving performance comparable to DeepSeek-R1 and GPT-4.1.

Paper link: https://arxiv.org/pdf/2506.11153

Project homepage: https://kcxain.github.io/mupa/

4. QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

This paper has been accepted by NeurIPS 2025. The first author is Master’s student Hainan Fang, advised by Associate Researcher Yuanbo Wen.

QiMeng-NeuComBack targets the problem of automatic translation from Intermediate Representation (IR) to assembly code in neural compilation by proposing a self-evolving prompt optimization method. Existing research in neural compilation often faces challenges regarding the lack of dedicated benchmarks and limited capabilities in generating reliable and high-performance code. This paper constructs the NeuComBack benchmark dataset, specifically designed for IR-to-assembly neural compilation tasks. The proposed self-evolving prompt optimization method enables the model to extract optimization insights from past self-debugging trajectories and iteratively evolve its internal prompts, thereby automatically refining its compilation strategies. Experimental results demonstrate that this method significantly improves the correctness and performance of the generated assembly code. Functional correctness on x86_64 and aarch64 architectures increased from 44% and 36% to 64% and 58%, respectively. Furthermore, among the correctly generated x86_64 programs, 87.5% achieved performance superior to the optimization level of the industry-standard compiler clang -O3.

Paper link: https://arxiv.org/abs/2511.01183

Project homepage:https://fanghainannn.github.io/QiMeng-NeuComBack-Web/

5.QiMeng-CRUX: Narrowing the Gap betweenNatural Language and Verilog via CoreRefined Understanding eXpression

This paper has been accepted by AAAI 2026. The first author is Ph.D. candidate Lei Huang, advised by Associate Researcher Rui Zhang and Senior Engineer Jiaming Guo. QiMeng-CRUX addresses the problem of excessive semantic gap between natural language and Verilog code generation by proposing a structured intermediate space called Core Refined Understanding eXpression (CRUX). Existing methods rely on free-form natural language descriptions, making them prone to the influence of ambiguous expression, loose structure, and semantic redundancy. As a result, models struggle to accurately capture design intent and generate reliable RTL logic. CRUX constructs a constrainable and interpretable intermediate semantic space, refines user intent into core design elements, and serves as a semantic bridge between natural language and Verilog. QiMeng-CRUX designs a two-stage training framework incorporating Joint Expression Modeling and Dual-Space Optimization to improve the quality of CRUX expressions and generated code. Experimental results show that QiMeng-CRUX-V achieves state-of-the-art (SOTA) performance among non-reasoning models on multiple Verilog generation benchmarks, with pass@1 reaching 64.7% on VerilogEval-v2 and 63.8% on RTLLM-v2. Additionally, as a semantically robust intermediate space, CRUX can bring consistent performance improvements even when directly used as prompts for other models.

Paper link:https://arxiv.org/abs/2511.20099

Project homepage:https://github.com/Taskii-Lei/QiMeng-CRUX-V

6. QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

This paper has been accepted by AAAI 2026. The first author is master’s student Xinguo Zhu, and the advisors are Researcher Ling Li and Associate Researcher Shaohui Peng.

QiMeng-Kernel proposes a hierarchical generation paradigm (Macro Thinking Micro Coding, MTMC), to address the excessive coupling between optimization strategies and implementation details that arises in large-model-based automatic generation of high-performance GPU kernels. Existing Large Language Model (LLM) based GPU kernel generation methods typically struggle to simultaneously ensure correctness and efficiency. On the one hand, the vast optimization space of GPU kernels is heavily dependent on hardware characteristics, which makes it difficult for LLMs to identify effective optimization strategies during the search process. On the other hand, due to the complexity of low-level implementation details, directly generating kernel code frequently leads to compilation failures, runtime errors, or significant performance degradation. The proposed MTMC decouples high-level optimization strategies from low-level implementation processes. At the macro level, MTMC generates optimization decisions based on hardware semantics; at the micro level, MTMC implements these optimizations through a multi-step, fine-grained process, thereby maximizing correctness and enhancing performance. Experimental results demonstrate that QiMeng-Kernel significantly outperforms existing LLM-based automatic GPU kernel generation methods on both KernelBench and TritonBench, achieving a correctness improvement of over 50% and a maximum speedup of 7.3 times.

Paper link:https://arxiv.org/abs/2511.20100

Project homepage: https://github.com/QiMeng-IPRC/QiMeng-Kernel

NeurIPS 2025

NeurIPS is a top-tier international conference in the fields of machine learning and computational neuroscience, recognized as a CCF-A conference. It primarily features cutting-edge research in machine learning, natural language processing, computer vision, multi-agent systems, and related areas. Since its establishment in 1987, NeurIPS has become one of the oldest and most influential conferences in the field, playing a significant role in advancing artificial intelligence and its related disciplines. The 39th NeurIPS conference in 2025 will be held in San Diego, USA, from December 2 to 7, with an acceptance rate of 24.52%.

AAAI 2026

AAAI is a top-tier international conference in the field of artificial intelligence (AI), classified as CCF-A. It primarily accepts the latest research results in areas such as machine learning, natural language processing, computer vision, multi-agent systems, and knowledge representation and reasoning. Since its founding in 1980, AAAI has gradually developed into one of the oldest and most academically influential top international conferences in the AI field, playing a crucial role in promoting the long-term development of AI and related interdisciplinary fields. The 40th AAAI Conference on Artificial Intelligence (AAAI-26) will be held in Singapore from January 20 to 27, 2026, with an acceptance rate of 17.6%.”

downloadFile