학술논문

BOOST: Block Minifloat-Based On-Device CNN Training Accelerator with Transfer Learning
Document Type
Conference
Source
2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) Computer Aided Design (ICCAD), 2023 IEEE/ACM International Conference on. :1-9 Oct, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Signal Processing and Analysis
Training
Design automation
Transfer learning
Bandwidth
Throughput
System-on-chip
Task analysis
Language
ISSN
1558-2434
Abstract
Adapting CNNs to changing problems is challenging on resource-limited edge devices due to intensive computations, high precision requirements, large storage needs, and high bandwidth. This paper presents BOOST, a novel block minifloat (BM)-based parallel CNN training accelerator on memory- and computation-constrained FPGAs for transfer learning (TL). By updating a small number of layers online, BOOST enables adaptation to changing problems. Our approach utilizes a unified 8-bit BM datatype (bm(2,5) ), i.e., with a sign bit, 2 exponent bits, and 5 mantissa bits, and proposes unified Conv and dilated Conv blocks that support non-unit stride and enable task-level parallelism during back-propagation to minimize latency. For ResNet20 and VGG-like training on CIFAR-10 and SVHN datasets, BOOST achieves near 32-bit floating point accuracy, reducing latency by 21%-43% and BRAM usage by 63%-66% compared to back-propagation training without TL. Notably, BOOST outperforms the prior SOTA works to achieve perbatch throughput of 131 and 209 GOPs for ResNet20 and VGG-like respectively.