학술논문

Python Implementation of the Dynamic Distributed Dimensional Data Model
Document Type
Conference
Source
2022 IEEE High Performance Extreme Computing Conference (HPEC) High Performance Extreme Computing Conference (HPEC), 2022 IEEE. :1-8 Sep, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Structured Query Language
Data analysis
Scientific computing
Databases
Machine learning
Big Data
Data models
Python
matrix
array
sparse linear algebra
data science
Language
ISSN
2643-1971
Abstract
Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. D4M.py implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamen-tal functions and building blocks, and performance results which compare D4M.py's performance to D4M-MATLAB and D4M.jl.