학술논문

Polyomino: A 3D-SRAM-Centric Accelerator for Randomly Pruned Matrix Multiplication With Simple Reordering Algorithm and Efficient Compression Format in 180-nm CMOS
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems I: Regular Papers IEEE Trans. Circuits Syst. I Circuits and Systems I: Regular Papers, IEEE Transactions on. 70(9):3440-3450 Sep, 2023
Subject
Components, Circuits, Devices and Systems
Sparse matrices
Memory management
Hardware
Computer architecture
Random access memory
Neural networks
Transformers
3D integration
compressed sparse matrix format
deep neural networks (DNNs)
pruning
static random access memory (SRAM)
vision transformer
Language
ISSN
1549-8328
1558-0806
Abstract
We have developed a sparse matrix reordering algorithm with a novel 3D-SRAM-centric Polyomino accelerator that enables efficient processing of the reordered matrix for parameter compression. By reordering randomly pruned, irregularly structured sparse matrices into regularly structured matrices, both the compression ratio of the data and the efficiency of the hardware processing increase. The reordering algorithm can be implemented simply by attributing it to the widely known k-sum problem. We also developed a compression format for storing the reordered matrices and show that the reordered regular structure can reduce the amount of required memory by 63% compared with the conventional method. The proposed Polyomino accelerator can efficiently process reordered matrices by using a 3D stacked SRAM, which is an external memory with random accessibility and low latency. The measurement results using a test chip fabricated in a 180-nm CMOS process demonstrate that the proposed accelerator can achieve high area-efficiency and high energy-efficiency and scales well with the pruning rate.