학술논문
Row-Segmented Sparse-Dense Matrix Matrix Multiplication on GPUs
Document Type
Conference
Author
Source
2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta) SMARTWORLD-UIC-SCALCOM-DIGITALTWIN-PRICOMP-META Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), 2022 IEEE. :376-383 Dec, 2022
Subject
Language
Abstract
SpMM (Sparse-Dense Matrix Matrix Multiplication) is the key computing core in many machine learning, big data analysis, and neural network applications. Although SpMM has received more attention in recent years, it is still a challenge to effectively improve the performance of SpMM. For example, existing work has largely given little consideration to improving performance by optimizing memory accesses. To alleviate the problems, we propose a matrix partition method, then, a SpMM algorithm is customized to match the partition, which makes full use of the architecture of GPU, and it easily improves the parallelism to achieve the best performance by adjusting parameters. Furthermore, we present two optimized algorithms for memory access, named rsSpMM and rrSpMM, which not only combine the advantages of the above algorithms, but also greatly exploit the advantage of different memory spaces of GPU. Experimental results on NVIDIA Tesla T4 demonstrate that the proposed algorithms in this paper has a significant improvement over the previous state-of-the-art cuSPARSE libraries, and preferably delivers 4. 6x, 8.9x, and 10. 1x improvements, respectively.