학술논문

Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey
Document Type
Academic Journal
Source
ACM Transactions on Architecture and Code Optimization.
Subject
High Performance Computing (HPC)
Accelerators
Sparse Matrices
Memory Hierarchy
Language
English
ISSN
1544-3566
1544-3973
Abstract
Performance in scientific and engineering applications such as computational physics, algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the manipulation of large sparse matrices – matrices with a large number of zero elements. Specialized software using data formats for sparse matrices has been optimized for the main kernels of interest: SpMV and SpMSpM matrix multiplications, but due to the indirect memory accesses, the performance is still limited by the memory hierarchy of conventional computers. Recent work shows that specific hardware accelerators can reduce memory traffic and improve the execution time of sparse matrix multiplication, compared to the best software implementations. The performance of these sparse hardware accelerators depends on the choice of the sparse format, COO, CSR, etc, the algorithm, inner-product, outer-product, Gustavson, and many hardware design choices. In this article, we propose a systematic survey which identifies the design choices of state-of-the-art accelerators for sparse matrix multiplication kernels. We introduce the necessary concepts and then present, compare and classify the main sparse accelerators in the literature, using consistent notations. Finally, we propose a taxonomy for these accelerators to help future designers make the best choices depending of their objectives.