Large trees, supertrees and the grass phylogeny
Citation:Nicolas Salamin, 'Large trees, supertrees and the grass phylogeny', [thesis], Trinity College (Dublin, Ireland). Department of Botany, 2003, pp 223
Salamin TCD THESIS 7197 Large trees.pdf (PDF) 6.055Mb
During the last decade, the advances of molecular techniques have profoundly changed the way scientists build and use phylogenetic trees. Vast fields of research as different as ecology, evolution of development, genomics, and systematics have been influenced by the growth of phylogenetics, and the possibilities offered by new techniques of tree reconstruction are likely to further anchor the discipline as a core component of evolutionary biology. Despite this, phylogenetic inference remains a particularly difficult task because no polynomial-time algorithm is available to reconstruct optimal trees based on a given data set and the problem is getting more difficult as the number of taxa handled in reconstructions increases. The last decade has witnessed the development of powerful computer architectures and software that have alleviated this burden. However, the reconstruction of comprehensive phylogenetic trees still has to rely on heuristic searches and sound statistical methods are often prohibited for large data sets due to associated computational difficulties. In this thesis, I explored the problem of reconstructing large phylogenetic trees. One aspect was to investigate how well current methods of tree reconstruction performed when faced with matrices containing hundreds or thousands of taxa. In chapter 2 of this thesis, computer simulations based on four large angiosperm trees were performed to assess the success of maximum parsimony and neighbour-joining to infer trees. The results indicated that the size of the matrix was not a problem in itself, and that the distribution of changes along the tree could be a more important factor. For instance, when conditions were favourable, more than 80% of the nodes from a tree containing 13,000 taxa could be correctly inferred with simulated data sets of 10,000 bp. With real data sets, it is however impossible to know how far the trees obtained are from the ‘true’ underlying evolutionary hypothesis. Resampling techniques, such as bootstrap or jackknife, have been developed to estimate how much confidence one can put on a particular node of a phylogenetic tree. With large numbers of taxa, these procedures become computationally intensive, especially if thorough heuristic searches are used. It is therefore important to understand the effects of different heuristic strategies on the support obtained for large phylogenetic trees, and whether faster tree search options could be used to reduce the time of the analyses without biasing the support obtained. In chapter 3, the level of support obtained by bootstrapping and jackknifing a 357 taxa molecular matrix for the angiosperms using four different heuristic search options were compared. Heuristic searches that performed rearrangements on the original tree obtained by stepwise addition of the taxa yielded comparable values of support for bootstrap and jackknife. However, the fastest technique could reduce the time of the analyses by 30-fold. These classical phylogenetic analyses are based on biological characters, such as morphological traits or DNA sequences, but supertree reconstruction methods have also been developed to build large phylogenetic trees by gathering the information directly from existing ‘source’ trees. An overlap of taxa between the source trees is sufficient for the methods to be applied, and the process allows very large trees to be created quickly. Several methods have been proposed to build supertrees and chapter 4 examined the ‘matrix representation using parsimony’ method. An empirical assessment using several different data sets from the grass family was made by comparing several modifications of this method. The data sets were analysed separately and the resulting topologies were used as source trees in the supertree reconstructions. Modifications that took into account the level of support present in the source trees produced supertrees that were closer to a classical analysis combining the different DNA sequences. Supertrees were also built from 55 published topologies for the grass family to create the largest grass phylogenetic trees containing 401 genera. The supertrees obtained highlighted interesting questions concerning the evolutionary history of the grass family, and the relationships between the clade comprising maize, wheat, and rice were further investigated in chapter 5. In this chapter, extensive simulations were performed to investigate whether the discrepancies between topologies obtained from different molecular data sets could be affected by random or systematic errors. The results indicated that several DNA sequences have a strong bias towards a particular placement of wheat. However, the general result suggested that the level of taxa and character sampling in studies of grass phylogenetics have not been sufficient to avoid high rates of errors and that these have impaired the ability of methods to correctly reconstruct grass evolutionary history. Finally, in response to the previous results, a large phylogenetic analysis of the trnLF and rbcL plastid regions is presented in chapter 6. The rbcL data set placed wheat as sister to maize, while this topology was only obtained with trnLF when Bayesian analysis was performed. With this DNA region, maximum parsimony analysis placed wheat within the BEP clade. The main subfamilies were supported, but the relationships between these groups could not be clearly defined. Divergence times were estimated by calibrating these phylogenetic trees with four grass fossils, suggesting a rapid diversification of the grasses between 40 to 30 Mya. The calibrated dates also allowed an estimate of the appearance of the C4 photosynthetic pathway in the grasses at 20 to 10 Mya, an origin that corresponded to low levels of past CO2 concentrations. Therefore, CO2 levels could have been a factor in the origin of C4 photosynthesis in grasses, an adaptation that could have helped the huge diversification of this important angiosperm family.
Author: Salamin, Nicolas
Advisor:Hodkinson, Trevor R.
Qualification name:Doctor of Philosophy (Ph.D.)
Publisher:Trinity College (Dublin, Ireland). Department of Botany
Note:TARA (Trinity’s Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: email@example.com
Type of material:thesis
Availability:Full text available