학술논문

A comparative study of statistical methods used to identify dependencies between gene expression signals.
Document Type
Article
Source
Briefings in Bioinformatics. Nov2014, Vol. 15 Issue 6, p906-918. 13p.
Subject
*GENE regulatory networks
*GENE expression
*ENTROPY power inequality
*MOLECULAR structure
*STATISTICS
Language
ISSN
1467-5463
Abstract
One major task in molecular biology is to understand the dependency among genes to model gene regulatory networks. Pearson’s correlation is the most common method used to measure dependence between gene expression signals, but it works well only when data are linearly associated. For other types of association, such as non-linear or non-functional relationships, methods based on the concepts of rank correlation and information theory-based measures are more adequate than the Pearson’s correlation, but are less used in applications, most probably because of a lack of clear guidelines for their use. This work seeks to summarize the main methods (Pearson’s, Spearman’s and Kendall’s correlations; distance correlation; Hoeffding’s D measure; Heller–Heller–Gorfine measure; mutual information and maximal information coefficient) used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method. Systematic Monte Carlo simulation analyses ranging from sample size, local dependence and linear/non-linear and also non-functional relationships are shown. Moreover, comparisons in actual gene expression data are carried out. Finally, we provide a suggestive list of methods that can be used for each type of data set. [ABSTRACT FROM PUBLISHER]