학술논문

Controlling technical variation amongst 6693 patient microarrays of the randomized MINDACT trial
Document Type
article
Source
Communications Biology. 3(1)
Subject
Biological Sciences
Bioinformatics and Computational Biology
Biomedical and Clinical Sciences
Oncology and Carcinogenesis
Human Genome
Clinical Trials and Supportive Activities
Cancer
Breast Cancer
Clinical Research
Biotechnology
Genetics
Adult
Aged
Biomarkers
Tumor
Breast Neoplasms
Female
Gene Expression Regulation
Neoplastic
Humans
Middle Aged
Neoplasm Proteins
Prognosis
Protein Array Analysis
Randomized Controlled Trials as Topic
Transcriptome
Biological sciences
Biomedical and clinical sciences
Language
Abstract
Gene expression data obtained in large studies hold great promises for discovering disease signatures or subtypes through data analysis. It is also prone to technical variation, whose removal is essential to avoid spurious discoveries. Because this variation is not always known and can be confounded with biological signals, its removal is a challenging task. Here we provide a step-wise procedure and comprehensive analysis of the MINDACT microarray dataset. The MINDACT trial enrolled 6693 breast cancer patients and prospectively validated the gene expression signature MammaPrint for outcome prediction. The study also yielded a full-transcriptome microarray for each tumor. We show for the first time in such a large dataset how technical variation can be removed while retaining expected biological signals. Because of its unprecedented size, we hope the resulting adjusted dataset will be an invaluable tool to discover or test gene expression signatures and to advance our understanding of breast cancer.