A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier

Authors

  • Xiangren Chen Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Bohan Yang Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Wenping Zhu Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Hanning Wang Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Qichao Tao Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Shuying Yin Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Min Zhu Wuxi Micro Innovation Integrated Circuit Design Co., Ltd
  • Shaojun Wei Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China
  • Leibo Liu Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China

DOI:

https://doi.org/10.46586/tches.v2025.i1.275-313

Keywords:

ZKP, NTT, Montgomery multiplication, MSM

Abstract

Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC.
So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM.
We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK.

Downloads

Published

2024-12-09

Issue

Section

Articles

How to Cite

A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier. (2024). IACR Transactions on Cryptographic Hardware and Embedded Systems, 2025(1), 275-313. https://doi.org/10.46586/tches.v2025.i1.275-313