Recently, the paper Cambricon-P: A Bitflow Architecture for Arbitrary Precision Computing from Intelligent Processor Research Center won the Best Paper Runner-up Award at MICRO 2022, one of the top-tier international conference on computer architecture and microarchitecture (also a CCF-A tier conference).
Cambricon-P is aiming at efficient APC processing. Arbitrary precision computing (APC), where the digits vary from tens to millions of bits, is fundamental for scientific applications, such as mathematics, physics, chemistry, and biology. APC on existing platforms (e.g., CPUs and GPUs) is achieved by decomposing the original data into small pieces to accommodate to the low-bitwidth (e.g., 32-/64-bit) functional units. However, such fine-grained decomposition inevitably introduces large amounts of intermediates, bringing in intensive on-chip data traffic and long, complex dependency chains, so that causing low hardware utilization.
To address this issue, Intelligent Processor Research Center propose Cambricon-P, a bitflow architecture supporting monolithic large and flexible bitwidth operations for efficient APC processing, which avoids generating large amounts of intermediates from decomposition. Cambricon- P features a tightly-integrated computational architecture for processing different bitflows in parallel, where full bit-serial data paths are deployed. The bit-serial scheme still needs to eliminate the dependency chain of APC for exploiting parallelism within one monolithic large-bitwidth operation. For this purpose, Cambricon-P adopts a carry parallel computing mechanism, which enables recursively transforming the multiplication into smaller inner-products that can be performed in parallel between bit-indexed IPUs (Inner-Product Units). Furthermore, to improve the computing efficiency of APC, Cambricon-P employs a bit-indexed inner-product processing scheme, namely BIPS, to eliminate intra-IPU bit-level redundancy. Compared to Intel Xeon 6134 CPU, Cambricon-P achieves two orders of magnitude performance improvement on monolithic long multiplication, and an order of magnitude speedup and energy benefit over four real-world APC applications on average.
Intelligent Processor Research Center team has devoted into computer architecture for many years. They proposed the world-first deep learning processor architecture DianNao (joint work with Inria, French) and taped out the world-first deep learning processor chip. In the big picture of new golden age of computer architecture, Cambricon-P opens an new area for joint research of computer architecture and scientific computing.
Premier on MICRO. MICRO is one of four top-tier international conferences on computer architecture (Big four), where other three conferences are ISCA, ASPLOS, and HPCA. MICRO has the longest history among the Big four, 55-year in total from 1968. Among the history of the Big four, only two papers from China mainland won the Best Paper Awards, i.e., DianNao in ASPLOS 2014 and DaDianNao in MICRO 2014, which are all from Intelligent Processor Research Center.