학술논문

UCAPF: A Unified Processing Platform for Large-scale Virtual Screening
Document Type
Conference
Source
2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2023 IEEE International Conference on. :3292-3298 Dec, 2023
Subject
Bioengineering
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Data integrity
Biological system modeling
Distributed databases
Parallel processing
Data processing
Data models
Parallel architectures
large-scale virtual screening
Parallel processing framework
heterogeneous system
Molecular data processing
Language
ISSN
2156-1133
Abstract
With growing data volume for large-scale virtual screening, the associated data processing and management meet challenges. We have developed UCAPF, A unified platform for large-scale virtual screening. The platform provides a parallel processing framework for large-scale virtual screening data. It also enables scheduling of heterogeneous parallel architectures and hierarchical storage of massive data. The processing framework improves data quality. On the CASF-2016 dataset, the standardized molecules processed by UCAPF showed a 9.5% to 14.62% improvement in scoring performance and a 7.9% to 34.6% improvement in ranking performance compared to the raw molecules. For massive data processing, the framework provides parallel efficiency of 81.20% for molecule standardized processing and 79.51% for docking result processing on a 72-unit Hadoop cluster. In addition, the distributed database for data management improves the ability to retrieve 10,094 molecules from seventy million docking result data by a factor of 2.40 compared to the single-node storage model. Finally, we analyze the variation of input/output (I/O) over time for different phases of virtual screening to reflect the effectiveness of the scheduling strategy and tiered storage for the heterogeneous parallel architecture.