학술논문

Privformer: Privacy-preserving Transformer with MPC
Document Type
Conference
Source
2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P) EUROSP Security and Privacy (EuroS&P), 2023 IEEE 8th European Symposium on. :392-410 Jul, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Protocols
Data analysis
Genomics
Transformer cores
Parallel processing
Transformers
Real-time systems
machine learning
deep neuralnetwork
multiparty computation
Transformer
GPT
Language
Abstract
The Transformer is a deep learning architecture that processes sequence data. The Transformer attains the state-of-the-art in several tasks of sequence data analysis, and its variants, such as BERT and GPT-3, are used as a defacto-standard for solving general tasks in natural language processing (NLP). This work presents a 3-party multi-party computation (MPC) protocol for secure inference of the Transfomer in the honest majority setting. The attention layer is the most time-consuming part when implementing an MPC protocol for the Transformer with existing building blocks. The attention mechanism is a core component of the Transformer that captures and exploits complex dependencies among elements in the input sequences. The attention mechanism invokes the exponentiation function O(S 2 ) times, which becomes a major bottleneck when implementing the Transformer with existing MPC primitives. To deal with this, we employ the Performer [11], a variant of the Transformer where the sigmoid function that invokes the exponentiation function is replaced with the ReLU function, a more MPC-friendly nonlinear function. Also, by introducing a kernel-based approximation of the attention matrix with random orthogonal matrices, we show that the attention layer can be processed with O(S) times calls of the ReLU function. We investigate the efficiency of the proposed method by an end-to-end implementation of the Transformer with 3-party MPC. Experimental evaluation shows that, for translating a sequence where the output sequence length is 64, the entire computation time takes about 19 minutes in the LAN environment.