학술논문

Extracting rate changes in transcriptional regulation from MEDLINE abstracts.
Document Type
Article
Source
BMC Bioinformatics. 2014 Supplement 2, Vol. 15, p1-12. 12p. 2 Diagrams, 6 Charts, 1 Graph.
Subject
*MEDLINE
*GENETIC transcription
*GENES
*LIBRARY information networks
*GENOMES
Language
ISSN
1471-2105
Abstract
Background: Time delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Results: Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%. Conclusions: The manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events. [ABSTRACT FROM AUTHOR]