학술논문

2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications
Document Type
Conference
Source
2024 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2024 IEEE International. 67:42-44 Feb, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Engineered Materials, Dielectrics and Plasmas
Photonics and Electrooptics
Robotics and Control Systems
Costs
AI accelerators
System-on-chip
Solid state circuits
Hardware acceleration
Language
ISSN
2376-8606
Abstract
The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud. Certain AI application areas, such as in high-frequency trading (HFT) [1–2], have a hard inference latency deadline for successful execution. We present our new AI accelerator which achieves high inference capability with outstanding single-stream responsiveness for demanding service-layer objective (SLO)-based AI services and pipelined inference applications, including large language models (LLM). Owing to low thermal design power (TDP), the scale-out solution can support multi-stream applications, as well as total cost of ownership (TCO)-centric systems effectively.