GPU Acceleration for FHEW/TFHE Bootstrapping

Yu Xiao; Feng-Hao Liu; Yu-Te Ku; Ming-Chien Ho; Chih-Fan Hsu; Ming-Ching Chang; Shih-Hao Hung; Wei-Chao Chen

doi:10.46586/tches.v2025.i1.314-339

Authors

Yu Xiao Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Feng-Hao Liu Washington State University, Pullman, USA
Yu-Te Ku Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan; Academia Sinica, Taipei, Taiwan
Ming-Chien Ho Inventec Corporation, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Chih-Fan Hsu Inventec Corporation, Taipei, Taiwan
Ming-Ching Chang Inventec Corporation, Taipei, Taiwan; State University of New York at Albany, Albany, USA
Shih-Hao Hung National Taiwan University, Taipei, Taiwan; Mohamed bin Zayed University of Artificial Intelligence, Masdar, Abu Dhabi
Wei-Chao Chen Inventec Corporation, Taipei, Taiwan

DOI:

https://doi.org/10.46586/tches.v2025.i1.314-339

Keywords:

Fully Homomorphic Encryption, Bootstrapping, GPU Acceleration

Abstract

Fully Homomorphic Encryption (FHE) allows computations to be performed directly on encrypted data without decryption. Despite its great theoretical potential, the computational overhead remains a major obstacle for practical applications. To address this challenge, hardware acceleration has emerged as a promising approach, aiming to achieve real-time computation across a wider range of scenarios. In line with this, our research focuses on designing and implementing a Graphic Processing Unit (GPU)-based accelerator for the third generation FHEW/TFHE bootstrapping scheme, which features smaller parameters and bootstrapping keys particularly suitable for GPU architectures compared to the other generations.
In summary, our accelerator offers improved efficiency, scalability, and flexibility for extensions, e.g., functional bootstrapping (Liu et al., Asiacrypt 2022), compared to current state-of-the-art solutions. We evaluate our implementation and demonstrate substantial speedup in the single-GPU setting, our bootstrapping achieves an 18x - 20x speedup compared to a 64-thread server-class CPU; by using 8 GPUs, the throughput can be further improved by 7x compared to the single-GPU implementation, confirming the scalability of our design. Furthermore, compared to the SoTA GPU solution TFHE-rs, we achieve a maximum speedup of 1.69x in AND gate evaluation. Finally, we benchmark several private machine learning applications, showing real-time solutions for (1) encrypted neural network inference for MNIST in 0.04 seconds per image, which is the fastest implementation to our knowledge.(2) private decision trees in 0.38 seconds for Iris dataset, where as prior 16 cores CPU implementation (Lu et al., IEEE S&P 2021) required 1.87 seconds; These results highlight the effectiveness and efficiency of our GPU-acceleration in real-world applications.
As a technical highlight, we design a novel parallelization strategy tailored for FHEW/TFHE bootstrapping, allowing an automated optimization that partitions bootstrapping into multiple GPU thread blocks. This is necessary for FHEW/TFHE bootstrapping with scalable parameters, where the whole bootstrapping process may not fit into a single thread block. With this, our accelerator can support a broader range of parameters, making it ideal for upcoming privacy-preserving applications.

GPU Acceleration for FHEW/TFHE Bootstrapping

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

iacr-logo