학술논문

Comparison of Two Methods for Finding Biomedical Categories in Medline
Document Type
Conference
Source
2011 10th International Conference on Machine Learning and Applications and Workshops Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on. 2:96-99 Dec, 2011
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Semantics
Statistical analysis
Unified modeling language
Vectors
Feature extraction
Ontologies
Diseases
Language
Abstract
In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline®. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent headwords from noun phrases in Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms, potential biomedical categories. We study and compare these two alternative sets of terms to identify semantic categories in Medline. Our method is completely data-driven.