학술논문

Row-Segmented Sparse-Dense Matrix Matrix Multiplication on GPUs

Document Type

Conference

Author

Xiao, Guoqing; Long, Ziming; Chen, Yuedan; Qin, Yunchuan; Wu, Fan

Source

2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta) SMARTWORLD-UIC-SCALCOM-DIGITALTWIN-PRICOMP-META Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), 2022 IEEE. :376-383 Dec, 2022

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Neural networks
Graphics processing units
Machine learning
Computer architecture
Big Data
Libraries
Partitioning algorithms
GPU
Matrix format
Parallel
Partition
SpMM

Language

Abstract

SpMM (Sparse-Dense Matrix Matrix Multiplication) is the key computing core in many machine learning, big data analysis, and neural network applications. Although SpMM has received more attention in recent years, it is still a challenge to effectively improve the performance of SpMM. For example, existing work has largely given little consideration to improving performance by optimizing memory accesses. To alleviate the problems, we propose a matrix partition method, then, a SpMM algorithm is customized to match the partition, which makes full use of the architecture of GPU, and it easily improves the parallelism to achieve the best performance by adjusting parameters. Furthermore, we present two optimized algorithms for memory access, named rsSpMM and rrSpMM, which not only combine the advantages of the above algorithms, but also greatly exploit the advantage of different memory spaces of GPU. Experimental results on NVIDIA Tesla T4 demonstrate that the proposed algorithms in this paper has a significant improvement over the previous state-of-the-art cuSPARSE libraries, and preferably delivers 4. 6x, 8.9x, and 10. 1x improvements, respectively.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송