학술논문

System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory
Document Type
Conference
Source
2023 IEEE International Conference on Big Data and Smart Computing (BigComp) BIGCOMP Big Data and Smart Computing (BigComp), 2023 IEEE International Conference on. :9-12 Feb, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Performance evaluation
Data analysis
Memory management
Prototypes
Physical layer
Software
Sparks
CXL
Memory Expansion
Memory Solution
Apache Spark
Shuffle
In-memory DBMS
Data Analytics Platform
Language
ISSN
2375-9356
Abstract
Compute Express Link (CXL) is attracting from the industry as the latest emerging technology trend. CXL is a newcoherent interface that offering high-bandwidth, low-latency connectivity between host processor and devices such as memory, smart NICs, and accelerators. Unlike the previous DDR interface, CXL interconnect provides a distinct memory interface based on PCIe 5.0 physical layer, and it is expected that system memory or new memory tier can be expanded more flexibly in a different way than before. However, most of the studies covering CXL memory focus on device performance validation, and tend not to interest on real-world’s use case scenarios. This paper introduces two practical use case scenarios on data analytics platform using CXL memory prototype and its support software: 1. Apache Spark, CXL memory takes over the storage’s role that Spark’s shuffle intermediate data space. As a result, it improved the baseline vanilla Spark by 2.2x for TeraSort workload and by an average of 4.9x for shuffle-heavy TPC-DS queries. 2. In-memory DBMS (Database Management System), host memory expansion using CXL memory enhanced query performance by an average of 6.0x for NYC Taxi benchmark, thus expected to reduce total cost of ownership (TCO) by using fewer servers thanscale-out.