학술논문

QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning.
Document Type
Article
Source
Journal of Cheminformatics. 4/29/2024, Vol. 16 Issue 1, p1-12. 12p.
Subject
*DATABASES
*CHEMICAL properties
*SCIENTIFIC literature
*SCIENCE databases
*ELECTRONIC structure
*MACHINE learning
*ARTIFICIAL membranes
Language
ISSN
1758-2946
Abstract
Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules. In this study, a high-quality QC property database, called QuanDB, was developed, which included structurally diverse molecular entities and featured a user-friendly interface. Currently, QuanDB contains 154,610 compounds sourced from public databases and scientific literature, with 10,125 scaffolds. The elemental composition comprises nine elements: H, C, O, N, P, S, F, Cl, and Br. For each molecule, QuanDB provides 53 global and 5 local QC properties and the most stable 3D conformation. These properties are divided into three categories: geometric structure, electronic structure, and thermodynamics. Geometric structure optimization and single point energy calculation at the theoretical level of B3LYP-D3(BJ)/6-311G(d)/SMD/water and B3LYP-D3(BJ)/def2-TZVP/SMD/water, respectively, were applied to ensure highly accurate calculations of QC properties, with the computational cost exceeding 107 core-hours. QuanDB provides high-value geometric and electronic structure information for use in molecular representation models, which are critical for machine-learning-based molecular design, thereby contributing to a comprehensive description of the chemical compound space. As a new high-quality dataset for QC properties, QuanDB is expected to become a benchmark tool for the training and optimization of machine learning models, thus further advancing the development of novel drugs and materials. QuanDB is freely available, without registration, at https://quandb.cmdrg.com/. Scientific contribution: The QuanDB database contains comprehensive quantum chemical properties of diverse organic molecular entities, and all data have been rigorously pretreated and manually cleaned to ensure high accuracy. By utilizing the quantum chemical properties provided by QuanDB, relevant three-dimensional (3D) electronic structural information can be included in comprehensive molecular representation models to facilitate drug and material design. Compared to other similar databases, QuanDB covers a broader space of chemical compounds, adopts a higher level of theoretical calculations, and offers a user-friendly interface. [ABSTRACT FROM AUTHOR]