학술논문

The impact of sample size and tissue type on the reproducibility of gene co-expression networks
Document Type
Conference
Source
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. :1-10
Subject
co-expression networks
gene expression
reproducibility
transcriptomics
Language
English
Abstract
Identifying relationships between genes facilitates the comparison of different cell types at the transcriptomic level. Gene expression data such as RNA-seq can be used to construct co-expression networks, which is one means in systems biology to describe the coordinated expression patterns among genes across samples. Currently, there is no consensus as to the number of samples required to construct a reproducible gene co-expression network. Indeed, irreproducibility of gene expression experiments is a major challenge, and small sample sizes tend to be one of the major causes. However, recommending a single sample size that applies to all scenarios may not be practical. As such, we utilize a systematic, quantitative approach to study the effect of sample size on the reproducibility of constructing large, fully-connected gene co-expression networks using several correlation-based measures or mutual information. This approach does not require synthetic datasets that are constructed based on oversimplified assumptions nor is it dependent on known functional annotations. Further, we describe two similarity measures to measure consistency and use them to determine if the biological variance present within samples impacts the rate at which the networks will stabilize and compare to networks with randomly reassigned nodes. Our results show that the required number of samples to construct consistent co-expression networks could be influenced by the tissue type used to construct the networks as well as the similarity measure used to measure consistency.

Online Access