학술논문

Inferring ancestral and biogeographic origin using genome-wide SNP data
Document Type
Electronic Thesis or Dissertation
Source
Subject
611
Language
English
Abstract
Statistically predicting the origin of a human DNA sample has proven effective in forensic casework, and is of broad population genetic interest, but many facets of the problem remain unexplored. This thesis consolidates several pieces of work on the genetic prediction of origin; furthering its theory and practice. I begin by examining the performance of straightforward predictive methods when the objective is to correctly determine an individual's origin from one of several closely related genetic groups, e.g. countries within Europe. Of particular interest are the volume of data required to make useful predictions, and the negative impact of combining data from independently collected convenience samples. Depending on these factors, I show that it is possible to predict origin from either Great Britain or Ireland with good accuracy. The same approach was applied to a unique dataset, provided by Dr. Jim Wilson and colleagues, in which individuals were ascertained based on their self-reported village of origin from sets of neighbouring villages. Highly accurate predictions for village of origin were attained, demonstrating the detailed geographic resolution at which this branch of statistical methodology may succeed. Origin has innumerable aspects, many of which may be tractable to genetic prediction. Two in particular are the topics of consideration throughout the remainder of the thesis. First, I develop models for predicting separate ancestral components in each parent of a genotyped 'target' individual (Crouch and Weale, 2012), providing a more detailed profile than models of lone personal ancestry. Accuracy is high when genome-wide data are available and the modelled populations are relatively genetically dissimilar e.g. West Africa, Europe and East Asia. Second, I develop a method for predicting geographic coordinates for target individuals, constituting an estimate of their biogeographic origin in continuous space, and compare performance within Europe against existing approaches. While one alternative (Hoggart et al., 2012) displayed greater accuracy, the merits of each method are discussed in full.

Online Access