학술논문

A Metagenomic Analysis of Environmental and Clinical Samples Using a Secure Hybrid Cloud Solution.
Document Type
article
Source
Journal of Biomolecular Techniques. 30(Suppl)
Subject
Bioengineering
Networking and Information Technology R&D
Human Genome
Genetics
Infection
Generic Health Relevance
Biological Sciences
Technology
Medical and Health Sciences
Biological sciences
Language
Abstract
The number and types of studies about the human microbiome, metagenomics and personalized medicine, and clinical genomics are increasing at an unprecedented rate, leading to computational challenges. For example, the analysis of patient/clinical samples requires methods capable of (i) accurately detecting pathogenic organisms, (ii) running with high speed to allow short response-time and diagnosis, and (iii) scaling to ever growing databases of reference genomes. While cloud-computing has the potential to offer low-cost solutions to these needs, serious concerns regarding the protection of genomic data exist due to the lack of control and security in remote genomic databases. We present a novel metagenomic analysis system called "Virgile" that is capable of performing privacy-preserving queries on databases hosted in outsourced servers (e.g., public or cloud-based). This method takes as input the sequenced data produced by any modern sequencing instruments (e.g., Illumina, Pacbio, Oxford Nanopore) and outputs the microbial profile using a database of whole genome sequences (e.g., the RefSeq database from NCBI). The algorithm for the microbial profile aims to estimate without bias the abundance of microorganisms present using a genome-centric approach. Result: Using an extensive set of 65 simulated datasets, negative and positive controls, real clinical samples, and mock communities, we show that Virgile identifies and estimates the abundance of organisms present in environmental or clinical samples with high accuracy compared to state-of-the-art and popular methods available, including MetaPhlAn2 and KrakenUniq. Running at high speed, Virgile can also be run on a standard 8 GB RAM laptop. Virgile is a novel privacy-preserving abundance estimation algorithm called Virgile that can efficiently and rapidly discern the abundance and taxonomic identification of organisms present in a metagenomic sample, including those from medical environments. To the best of our knowledge, Virgile is the only metagenome analysis system leveraging cloud computing in a secure manner.