학술논문

Combinatorial pooling enables selective sequencing of the barley gene space.
Document Type
article
Source
PLoS computational biology. 9(4)
Subject
Chromosomes
Artificial
Bacterial
Hordeum
Oryza sativa
Genetic Markers
Physical Chromosome Mapping
Contig Mapping
Cloning
Molecular
Sequence Analysis
DNA
Computational Biology
Genomics
Species Specificity
Genomic Library
Genes
Plant
Models
Genetic
Computer Simulation
Oryza
Chromosomes
Artificial
Bacterial
Cloning
Molecular
Sequence Analysis
DNA
Genes
Plant
Models
Genetic
Mathematical Sciences
Biological Sciences
Information and Computing Sciences
Bioinformatics
Language
Abstract
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.