학술논문
2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications
Document Type
Conference
Author
Yu, Chang-Hyo; Kim, Hyo-Eun; Shin, Sungho; Bong, Kyeongryeol; Kim, Hyunsuk; Boo, Yoonho; Bae, Jaewan; Kwon, Minjae; Charfi, Karim; Kim, Jinseok; Kim, Hongyun; Shim, Myeongbo; Ha, Changsoo; Shin, Wongyu; Yoon, Jae-Sung; Chi, Miock; Lee, Byungjae; Choi, Sungpill; Kim, Donghan; Woo, Jeongseok; Yoon, Seokju; Jo, Hyunje; Kim, Hyunho; Heo, Hyungseok; Jin, Young-Jae; Yu, Jiun; Lee, Jaehwan; Kim, Hyunsung; Kang, Minhoo; Choi, Seokhyeon; Kim, Seung-Goo; Choi, Myunghoon; Oh, Jungju; Kim, Yunseong; Kim, Haejoon; Je, Sangeun; Ham, Junhee; Yoon, Juyeong; Lee, Jaedon; Park, Seonhyeok; Park, Youngseob; Lee, Jaebong; Hong, Boeui; Ryu, Jaehun; Ko, Hyunseok; Chung, Kwanghyun; Choi, Jongho; Jung, Sunwook; Arthanto, Yashael Faith; Kim, Jonghyeon; Cho, Heejin; Jeong, Hyebin; Choi, Sungmin; Han, Sujin; Park, Junkyu; Lee, Kwangbae; Bae, Sung-Il; Bang, Jaeho; Lee, Kyeong-Jae; Jang, Yeongsang; Park, Jungchul; Park, Sanggyu; Park, Jueon; Shin, Hyein; Park, Sunghyun; Oh, Jinwook
Source
2024 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2024 IEEE International. 67:42-44 Feb, 2024
Subject
Language
ISSN
2376-8606
Abstract
The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud. Certain AI application areas, such as in high-frequency trading (HFT) [1–2], have a hard inference latency deadline for successful execution. We present our new AI accelerator which achieves high inference capability with outstanding single-stream responsiveness for demanding service-layer objective (SLO)-based AI services and pipelined inference applications, including large language models (LLM). Owing to low thermal design power (TDP), the scale-out solution can support multi-stream applications, as well as total cost of ownership (TCO)-centric systems effectively.