Seeing the trees for the forest: a comparison of methods for inferring the tree of firstname.lastname@example.org
A central goal of biology is to uncover the tree of life – determining how all species, extinct and extant, are ultimately related. Here the student will investigate how different approaches to inferring this tree can be compared. Pre-existing databases of primarily animal trees, but with strong focuses on arthropods (e.g., insects, crustaceans) and tetrapods (e.g., birds, dinosaurs, ichthyosaurs, lizards, plesiosaurs, pterosaurs, mammals) will be expanded and developed, allowing the broadest possible set of comparisons to be made. The student will also have the option to develop tree(s) of their own. Comparisons will be made using methods that compare the branching order of trees with the appearances of taxa in the fossil record. Thus only groups with fossil records will be used.
Because the branching order of different phylogenies and the appearance of taxa in the fossil record are independent estimates of the same thing their mutual agreement is suggestive of accuracy (and by extension, disagreement of inaccuracy). A suite of metrics has been produced to make such comparisons (summarised in Bell and Lloyd 2015). These allow competing phylogenies to be compared. Although so far limited headway has been made in this area (but see Brochu and Norell 2001). Instead these metrics have been applied to different groups and time periods to see if any major patterns emerge (for a recent summary see O’Connor and Wills 2016).
Comparison of competing phylogenies is important as there is disagreement on the best methods to use in inferring them. Palaeontologists have traditionally relied on parsimony – algorithms that attempt to select trees that minimise the number of evolutionary “steps”. However, more recently model-based approaches that make particular assumptions about evolutionary change have been employed. Similarly, different statistical paradigms are in use. For example, the Bayesian approach, which requires a number of guesses to be made about the tree before analysis proceeds. Yet other approaches combine the outputs of individual analyses into larger “supertrees” (Davis and Page 2014; Hill and Davis 2014; Lloyd et al in 2016). Being able to compare the outputs of trees generated from such disparate approaches should help aid in establishing a “best practice” in the field, and hence the project has the potential to be highly influential.
The student will primarily use the software package “strap” (Bell and Lloyd 2015) to make their comparisons, for which the main supervisor is an author. Other software produced by, or otherwise familiar to the supervisors will also be employed (e.g., The Supertree Toolkit 2; Hill and Davis 2014), allowing the student the best possible access to cutting edge techniques.
As part of the project the student will be able to test a number of hypotheses that would correspond to both a thesis chapter and a publication. Specific possibilities include:
- Simulation studies suggest that model-based phylogenetics are superior to methods that attempt to minimise the number of evolutionary steps on a tree (Wright and Hillis 2014; O’Reilly et al 2016). However, so far there has been no comparison between empirical data sets. Here the student will re-analyse a series of data sets, applying both methods to infer trees, and then comparing the results using stratigraphic fit to establish whether the same is true for empirical data.
- Empirical studies suggest that morphological characters based on soft parts are more congruent with molecular trees than those based on hard parts (Sansom and Wills 2013). Here the student will compare the stratigraphic fits of trees produced from different partitions of the same data set. This could encompass a number of different projects representing their own chapters or papers, for example: morphology versus molecules (the latter thought to be superior), hard parts versus soft parts (the latter thought to be superior), or head versus body characters (the former thought to be superior),
- Generating very large trees can be problematic and so is often achieved by combining smaller trees together as “supertrees” (e.g., Hill and Davis 2014; Lloyd et al 2016). However, it is not clear whether this is superior to combining the original data instead (“supermatrices”), nor whether this process leads to poorer stratigraphic fits in the final tree than it does in the smaller source trees. Here the student will explore whether the extra effort of creating a supermatrix results in trees with superior stratigraphic fit. Alternatively, or additionally, they will compare the fits of supertrees with their input source trees to see if there is a resulting loss in stratigraphic “fidelity” or not.
- Stratigraphic congruence has seen very limited usage when comparing competing phylogenies for controversial areas of the tree of life (but see Brochu and Norell 2001). Here the student will compare major controversies using stratigraphic congruence. For example, the controversial relationships of squamates (lizards and snakes) and the position of turtles in the tetrapod tree.
- Recently developed methods are allowing palaeontologists to employ time in inferring phylogenies (e.g., Bapst et al 2016), theoretically biasing trees towards greater stratigraphic congruence. Even more recently it has been suggested that geography also be incorporated in inference (De Baets et al 2016; Landis 2017). Here the student will have the opportunity to develop new method(s) that exploit the parallels between time and space to compare trees based on their implied number of biogeographic dispersal events as an alternative to stratigraphic congruence.
Potential for high impact outcome
This project will allow the student to address three major areas: 1) comparing methods for inferring phylogenetic trees, 2) comparing different data types for inferring phylogenetic trees, and 3) comparing conflicting and controversial phylogenetic trees. These all offer excellent opportunities for high impact results.
Over the course of the project we expect the student to pick up a wide range of practical skills, including: databases, phylogenetics, and programming (mainly in the R language). These can be taught directly by the supervisory team although attendance of formal courses will also be strongly encouraged. It is expected that by the end of the PhD the student will have a strong set of transferrable skills and subsequent broad employment opportunities.
The project will suit a student who has a first degree in either geology or biology (a Masters is desirable but not essential). Proficiency with computational and numerical skills will be helpful, but these can also be taught by the supervisory team. Strong organisational skills will also be important.
- Bapst, D. W., Wright, A. M., Matzke, N. J. and Lloyd, G. T., 2016. Topology, divergence dates and macroevolutionary inferences vary between different tip-dating approaches applied to fossil theropods (Dinosauria). Biology Letters, 12, 20160237.
- Bell, M. A. and Lloyd, G. T., 2015. strap: an R package for plotting phylogenies against stratigraphy and assessing their stratigraphic congruence. Palaeontology, 58, 379-389.
- Brochu, C. A., and Norell, M. A., 2001. Time and trees: A quantitative assessment of temporal congruence in the bird origins debate. pp. 511-535 in J. A. Gauthier and L. F. Gall (eds.), New Perspectives on the Origin and Early Evolution of Birds, Peabody Museum of Natural History, New Haven, CT.
- Davis, K. E. and Page, R. D. M., 2014. Reweaving the tapestry: a supertree of birds. PLOS Currents Tree of Life, 1.
- De Baets, K., Antonelli, A. and Donoghue, P. C. J., 2016. Tectonic blocks and molecular clocks. Philosophical Transactions of the Royal Society B, 371, 20160098.
- Hill, J. E. and Davis, K. E., 2014. The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction. Biodiversity Data Journal, 2, e1053.
- Landis, M. J., 2017. Biogeographic dating of speciation times using paleogeographically informed processes. Systematic Biology, 66, 128-144.
- Lloyd, G. T., Bapst, D. W., Friedman, M. and Davis, K. E., 2016. Probabilistic divergence time estimation without branch lengths: dating the origins of dinosaurs, avian flight, and crown birds. Biology Letters, 12, 20160609.
- O’Connor, A. and Wills, M. A., 2016. Measuring stratigraphic congruence across trees, higher taxa and time. Systematic Biology, 65, 792-811.
- O’Reilly, J. E., Puttick, M. N., Parry, L., Tanner, A. R., Tarver, J. E., Fleming, J., Pisani, D. and Donoghue, P. C. J., 2016. Bayesian methods outperform parsimony but at the expense of precision in the estimation of phylogeny from discrete morphological data. Biology Letters, 12, 20160081.
- Sansom, R. S. and Wills, M. A., 2013. Fossilization causes organisms to appear erroneously primitive by distorting evolutionary trees. Scientific Reports, 3, 2545.
- Wright, A. M. and Hillis, D. M., 2014. Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data. PLoS ONE, 9, e109210.
Related undergraduate subjects:
- Applied mathematics
- Biodiversity conservation
- Computer science
- Earth science
- Environmental biology
- Environmental conservation
- Geological science
- Natural sciences