학술논문

Integration of heterogeneous time series gene expression data by clustering on time dimension
Document Type
Conference
Source
2017 IEEE International Conference on Big Data and Smart Computing (BigComp) Big Data and Smart Computing (BigComp), 2017 IEEE International Conference on. :332-335 Feb, 2017
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Time series analysis
Gene expression
Stress
Algorithm design and analysis
Clustering algorithms
Bars
Temperature distribution
Language
ISSN
2375-9356
Abstract
Recently developed highly parallelized sequencing technologies allow now even small research groups to conduct multi-time point analysis in affordable time and cost, thus available time-series gene expression data sets are rapidly increasing. However, when the time series data generated from the different research groups are considered, the meta-properties of time series data such as time points and the age of samples become heterogeneous in the bunch of time series data. Thus, we propose a novel three-step analysis algorithm to integrate heterogeneous time series gene expression data set. The key ideas of the algorithm are to convert incomparable heterogeneous multi-time-point data into comparable DEG vectors using time-point clustering and to determine the consensus differentially expressed gene (DEG) vector for the input DEG vectors that minimize the sum of cosine distances. As tested with 12 low-temperature stress treated heterogeneous time-series gene expression data sets, our integration analysis algorithm showed the ability to detect low-temperature-responsive genes from 12 heterogeneous low temperature treated time series data set.