학술논문

Big Data Oriented Graph Division and Storage
Document Type
Conference
Source
2023 IEEE 8th International Conference on Big Data Analytics (ICBDA) Big Data Analytics (ICBDA), 2023 IEEE 8th International Conference on. :47-52 Mar, 2023
Subject
Computing and Processing
Memory management
Directed graphs
Big Data
Programming
Explosions
Libraries
Machinery
graph division
big data computing
big data storage
Language
Abstract
Parameters of machinery parts have dependency relationships between each other. Given a part, our task is to compute all the combinations of all its parameters constrained by the dependency relationships between them. A directed graph can be used to represent the dependency relationships between those parameters. However, simply using the width-first extension to solve the combinations would lead to the problem of combinatorial explosion, even with the constraint imposed by the dependency relationships. This problem causes consequences of too long the computing time, too large the required computer memory, and too slow the subsequent data querying. To solve the problem, we propose a method of graph division to reduce the computation scale. Based on specific characteristics of our task, the method divides the directed graph into various regions, and further divides complex ones of the regions into various sub-graphs. By the strategy of divide-and-conquer, processing of the entire graph is decomposed into processing of simple sub-graphs. Even with the application of above graph division, rows number of tables used to store the result of the extension can still be as large as one billion. Conventional DBMSs would perform poorly at such large a data scale. To combat this problem, we use a tool of persistent memory programming, Metall, to process the big data. Utilizing the tool gains a high I/O throughout. Our experiments show that, among 3,581 machinery parts being processed, for 97% of them, each part can be processed within 6 hours. Moreover, the technique of graph division improves the efficiency of subsequent data querying by 77%.