Practical sessions

PROTEIN PHYLOGENETICS

Exercise 2.

Bootstrapping with PROTDIST and FITCH.

In order to assess the amount of support from the alignment (how many characters are actually supporting these relationships), you will perform a bootstrap analyses. This is done by going through the following steps:

Use SEQBOOT to bootstrap the alignment, type: seqboot and type y for yes - 100 bootstrap replicates will be performed. Copy the outfile as infile for the next step. You can look at the bootstraped data by using the function more (more outfile). You can see that some posiiton are present more than once.

Use PROTDIST to calculate the distances for all 100 bootstrapped replicate. Type p to select the Kimura formula (speeds up the process!!! Important for today's practicals since many of you are using the same computer) and "m" to inform the program that you have 100 replicates to be analysed. Type y to start the distance calculations. Once finished copy the outfile (containing all the distances from each bootstrap replicates) as infile for the next step: cp outfile infile.

Use FITCH to estimate the 100 trees from the 100 bootstapped repicates. Type "m" to inform the program that you have 100 distances to be analysed, type "j" for random addition of taxa with 1 replicate, type "o" (the letter O) to select the taxon 6 as the outgroup, and type "y" to start the analysis. Copy the treefile as infile for the next step.

Use CONSENSE to calculate the majority rule consensus tree. Type "R" to inform the program that your trees were previously rooted and type y to start the calculation. Look at the results by typing "more outfile". Give a new name to the outfile so that you can compare that results with the next analyses.

What is the support for the phylogenetic position of the Microsporidia and how does it compare with the other bootstrap values in the tree? Is the tree topology well supported for the method used?

2) PUZZLE.
PUZZLE4.0 can be run on several platforms including UNIX and Macintosh. The program can be downloaded from the web. It was written by Korbinian Strimmer and Arndt von Haeseler.

Because of the intensive calculations needed for maximum likelihood analyses there is often only a limited number of taxa that can be analyzed at a time. To reduce this limitation these authors have proposed a quartet approach to allow faster maximum likelihood analyses of large datasets (numerous taxa).

Instead of searching trees with the full range of taxa they proposed a method where a tree is estimate through a two step process. The first step involves the calculation of the best tree with maximum likelihood for all possible combination of four taxa (quartets), a simple task since there is only three possible topologies for a quartet.

All quartets are then combined into a single tree for all n-taxa using a consensus method, if all quartets are compatible a unique tree will always be found. This is very unlikely with real data and different trees can be obtained. This is typically dependent on the order in which quartets are combined.

To avoid quartet sampling order effects the last step is repeated numerous times with a different quartet order each time, this is the so called puzzling step. After numerous, typicaly 1,000-10,000, puzzling steps a majority consensus tree is calculated from all reconstructed n-taxa trees.

The final tree summarizes the result by suggesting an n-taxa tree. Support for the tree topology is indicated by the resolved branching pattern and PUZZLE support values (maximum 100%, i.e. all n-taxa trees recovered the specific clade or polytomies for really poorly support branching patterns). These values are not bootstrap values but can apparently correspond well to them in some situations.

<< Prev | Next >>