학술논문

Lightening-Transformer: A Dynamically-Operated Optically-Interconnected Photonic Transformer Accelerator
Document Type
Conference
Source
2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA) HPCA High-Performance Computer Architecture (HPCA), 2024 IEEE International Symposium on. :686-703 Mar, 2024
Subject
Computing and Processing
Tensors
Electron accelerators
Machine learning
Transformer cores
Transformers
Energy efficiency
Vectors
Algorithm-Architecture Co-design
Transformer
Attention
Domain-Specific Accelerator
Photonic Accelerator
Optical Neural Network
Language
ISSN
2378-203X
Abstract
The wide adoption and significant computing resource cost of attention-based transformers, e.g., Vision Transformers and large language models, have driven the demand for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Photonic accelerators have demonstrated promising results for convolutional neural networks (CNNs) workloads, which predominantly rely on weight-static linear operations. However, they encounter challenges when it comes to efficiently supporting attention-based Transformer architectures, raising questions about the applicability of photonics to advanced machine-learning tasks. The primary hurdle lies in their inefficiency in handling the unique workloads inherent to Transformers, i.e., dynamic and full-range tensor multiplication. In this work, we propose Lightening-Transformer, the first light-empowered, high-performance, and energy-efficient photonic Transformer accelerator. To overcome the fundamental limitation of existing photonic tensor core designs, we introduce a novel dynamically-operated photonic tensor core, DPTC, consisting of a crossbar array of interference-based optical vector dot-product engines, supporting highly parallel, dynamic, and full-range matrix multiplication. Furthermore, we design a dedicated accelerator that integrates our novel photonic computing cores with photonic interconnects for inter-core data broadcast, fully unleashing the power of optics. The comprehensive evaluation demonstrates that Lightening-Transformer achieves >2.6x energy and > 12 x latency reductions compared to prior photonic accelerators and delivers the lowest energy cost and 2 to 3 orders of magnitude lower energy-delay product compared to the electronic Transformer accelerator, all while maintaining digital-comparable accuracy. Our work highlights the immense potential of photonics for efficient hardware accelerators, particularly for advanced machine-learning workloads, such as Transformer-backboned large language models (LLM). Our implementation is available at https://github.com/zhuhanqing/Lightening-Transformer.