학술논문

Tumor Copy Number Data Deconvolution Integrating Bulk and Single-cell Sequencing Data
Document Type
Conference
Source
2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) Computational Advances in Bio and Medical Sciences (ICCABS), 2018 IEEE 8th International Conference on. :1-1 Oct, 2018
Subject
Bioengineering
Computing and Processing
Signal Processing and Analysis
Deconvolution
Tumors
Sequential analysis
Cancer
Evolution (biology)
Data models
cancer
sequencing
single-cell
phylogenetics
linear programming
Language
ISSN
2473-4659
Abstract
Resolving tumor heterogeneity is a crucial step in understanding cancer development and evolution but it is hampered by limits of all available data sources. Bulk sequencing has become the most common technology to assess the tumor heterogeneity but it has the limitation of mixing many genetically distinct cells in each sample which must then be computationally deconvolved. This genomic deconvolution generally has low resolution and high error rates in reconstructing clonal population structure. Recent technological developments in single-cell sequencing (SCS) provide the potential for providing high resolution, whole-genome reconstructions of clonal structure. However, the limitations of SCS – such as high noise, difficulty in scaling to large populations, various challenging technical artifacts, and the large data sets it produces – have so far made it impractical for applying to study cohorts of sufficient size to identify statistically robust features of tumor evolution. To address these problems, we have developed strategies to combine limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost. We specifically focus on the problem of deconvolving copy number data from bulk samples assisted by information from small numbers of SCS sequences. We developed a mixed membership model for clonal deconvolution via Non-Negative Matrix Factorization (NMF) balancing deconvolution quality of the bulk data with similarity to single-cell samples and an associated efficient coordinate descent algorithm. We improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using an integer linear programming (ILP) model to add a minimum evolution phylogenetic cost to the problem objective so as to bias deconvolution to favor inferred clones that are plausibly related to observed SCS data. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing significantly enhanced deconvolution accuracy relative bulk data alone.