The University of Dublin | Trinity College -- Ollscoil Átha Cliath | Coláiste na Tríonóide
Trinity's Access to Research Archive
Home :: Log In :: Submit :: Alerts ::

TARA >
School of Medicine >
Psychiatry >
Psychiatry (Scholarly Publications) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2262/56190

Title: Genetic Classification of Populations using Supervised Learning.
Author: MORRIS, DEREK
HERON, ELIZABETH ANN
PINTO, CARLOS
GILL, MICHAEL
CORVIN, AIDEN PETER
Sponsor: Science Foundation Ireland
Wellcome Trust
Author's Homepage: http://people.tcd.ie/mgill
http://people.tcd.ie/morrisdw
http://people.tcd.ie/eaheron
http://people.tcd.ie/capinto
http://people.tcd.ie/acorvin
Keywords: Genetics
SAMPLE COVARIANCE MATRICES
Issue Date: 2011
Citation: Bridges M, Heron E, O'Dushlaine, Segurado R, The International Schizophrenia Consortium (ISC), Morris DW, Corvin A, Gill M, Pinto C. , Genetic Classification of Populations using Supervised Learning., PLos One, 6, 5, 2011, e14802
Series/Report no.: PLos One
6
5
Abstract: There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into predefined populations, particularly in quality control for large scale genome wide association studies.
Description: PUBLISHED
URI: http://hdl.handle.net/2262/56190
Related links: http://dx.doi.org/10.1371/journal.pone.0014802
Appears in Collections:Psychiatry (Scholarly Publications)

Files in This Item:

File Description SizeFormat
Genetic Classification of Populations Using Supervised Learning.pdfPublished (publisher's copy) - Peer Reviewed680.35 kBAdobe PDFView/Open


This item is protected by original copyright


Please note: There is a known bug in some browsers that causes an error when a user tries to view large pdf file within the browser window. If you receive the message "The file is damaged and could not be repaired", please try one of the solutions linked below based on the browser you are using.

Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback