학술논문
Integration of heterogeneous time series gene expression data by clustering on time dimension
Document Type
Conference
Author
Source
2017 IEEE International Conference on Big Data and Smart Computing (BigComp) Big Data and Smart Computing (BigComp), 2017 IEEE International Conference on. :332-335 Feb, 2017
Subject
Language
ISSN
2375-9356
Abstract
Recently developed highly parallelized sequencing technologies allow now even small research groups to conduct multi-time point analysis in affordable time and cost, thus available time-series gene expression data sets are rapidly increasing. However, when the time series data generated from the different research groups are considered, the meta-properties of time series data such as time points and the age of samples become heterogeneous in the bunch of time series data. Thus, we propose a novel three-step analysis algorithm to integrate heterogeneous time series gene expression data set. The key ideas of the algorithm are to convert incomparable heterogeneous multi-time-point data into comparable DEG vectors using time-point clustering and to determine the consensus differentially expressed gene (DEG) vector for the input DEG vectors that minimize the sum of cosine distances. As tested with 12 low-temperature stress treated heterogeneous time-series gene expression data sets, our integration analysis algorithm showed the ability to detect low-temperature-responsive genes from 12 heterogeneous low temperature treated time series data set.