GPU Acceleration for FHEW/TFHE Bootstrapping

Authors

  • Yu Xiao Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
  • Feng-Hao Liu Washington State University, Pullman, USA
  • Yu-Te Ku Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan; Academia Sinica, Taipei, Taiwan
  • Ming-Chien Ho Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
  • Chih-Fan Hsu Inventec Corporation, Taipei, Taiwan
  • Ming-Ching Chang Inventec Corporation, Taipei, Taiwan; State University of New York at Albany, Albany, USA
  • Shih-Hao Hung National Taiwan University, Taipei, Taiwan; Mohamed bin Zayed University of Artificial Intelligence, Masdar, Abu Dhabi
  • Wei-Chao Chen Inventec Corporation, Taipei, Taiwan

DOI:

https://doi.org/10.46586/tches.v2025.i1.314-339

Keywords:

Fully Homomorphic Encryption, Bootstrapping, GPU Acceleration

Abstract

Fully Homomorphic Encryption (FHE) allows computations to be performed directly on encrypted data without decryption. Despite its great theoretical potential, the computational overhead remains a major obstacle for practical applications. To address this challenge, hardware acceleration has emerged as a promising approach, aiming to achieve real-time computation across a wider range of scenarios. In line with this, our research focuses on designing and implementing a Graphic Processing Unit (GPU)-based accelerator for the third generation FHEW/TFHE bootstrapping scheme, which features smaller parameters and bootstrapping keys particularly suitable for GPU architectures compared to the other generations.
In summary, our accelerator offers improved efficiency, scalability, and flexibility for extensions, e.g., functional bootstrapping (Liu et al., Asiacrypt 2022), compared to current state-of-the-art solutions. We evaluate our implementation and demonstrate substantial speedup in the single-GPU setting, our bootstrapping achieves an 18x - 20x speedup compared to a 64-thread server-class CPU; by using 8 GPUs, the throughput can be further improved by 7x compared to the single-GPU implementation, confirming the scalability of our design. Furthermore, compared to the SoTA GPU solution TFHE-rs, we achieve a maximum speedup of 1.69x in AND gate evaluation. Finally, we benchmark several private machine learning applications, showing real-time solutions for (1) encrypted neural network inference for MNIST in 0.04 seconds per image, which is the fastest implementation to our knowledge.(2) private decision trees in 0.38 seconds for Iris dataset, where as prior 16 cores CPU implementation (Lu et al., IEEE S&P 2021) required 1.87 seconds; These results highlight the effectiveness and efficiency of our GPU-acceleration in real-world applications.
As a technical highlight, we design a novel parallelization strategy tailored for FHEW/TFHE bootstrapping, allowing an automated optimization that partitions bootstrapping into multiple GPU thread blocks. This is necessary for FHEW/TFHE bootstrapping with scalable parameters, where the whole bootstrapping process may not fit into a single thread block. With this, our accelerator can support a broader range of parameters, making it ideal for upcoming privacy-preserving applications.

Downloads

Published

2024-12-09

Issue

Section

Articles

How to Cite

GPU Acceleration for FHEW/TFHE Bootstrapping. (2024). IACR Transactions on Cryptographic Hardware and Embedded Systems, 2025(1), 314-339. https://doi.org/10.46586/tches.v2025.i1.314-339