학술논문

Integrating Transformer-based Language Model for Drug Discovery
Document Type
Conference
Source
2024 11th International Conference on Computing for Sustainable Global Development (INDIACom) Computing for Sustainable Global Development (INDIACom), 2024 11th International Conference on. :1096-1101 Feb, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Geoscience
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Drugs
Training
Pandemics
Computational modeling
Predictive models
Transformers
Encoding
Drug Discovery
Generative AI
language Models
Transformer
Deep Learning
MoleculeNet
Language
Abstract
The recent onset of the COVID-19 pandemic has made it crucial to accelerate the creation of potent drugs against new diseases for the benefit of global society. Traditional drug discovery methods require extensive time, skilled personnel, and huge financial resources for testing molecules within the chemical space. Computational methods within Artificial Intelligence (AI) have elevated the traditional drug discovery framework to higher levels in the past decade. The most recent development in the domain of AI is dominated by Language Models (LM) which distinguish themselves with the efficacy and proficiency in generating innovative ideas through the assimilation of knowledge from training datasets. The LM can be effectively employed in many activities like Drug-Target interaction prediction, Molecular similarity prediction, Compound property prediction, and generating new molecules by feeding an already known molecule for a given target in a minimum time, which can reduce the overall time and project cost to a great extent. This work presents an innovative approach for the precise prediction of molecular property to inhibit HIV replication by leveraging the BERT (Bidirectional Encoder Representations from Transformers) language model with a standardized dataset based on SMILES (Simplified Molecular Input Line Entry System). Utilizing the MoleculeNet-HIV dataset, the integrated transformer-based architecture is fine-tuned for experiments, demonstrating superior prediction accuracy and generalization capabilities. The proposed approach shows favorable outcomes and has the capacity to decrease both expenses and time within the procedure.