Maximum Likelihood 1
PPT Slide
Maximum Likelihood 2
PPT Slide
1 CGAGAC
2 AGCGAC
3 AGATTA
4 GGATAG
What is the probability that unrooted Tree A (rather than another tree) could have generated the data shown under our chosen model ?
Maximum likelihood tree reconstruction 1
1
2
3
4
Tree A
PPT Slide
1 CGAGAC
2 AGCGAC
3 AGATTA
4 GGATAG
What is the probability that unrooted Tree A (rather than another tree) could have generated the data shown under our chosen model ?
Maximum likelihood tree reconstruction 1
1
2
3
4
Tree A
note rooting is arbitrary
PPT Slide
1 CGAGA C
2 AGCGA C
3 AGATT A
4 GGATA G
j
ACGT
The likelihood for a particular site j is the sum of the probabilities of every possible reconstruction of ancestral states under a chosen model
Maximum likelihood tree reconstruction 2
4 x 4 possibilities
Tree A
PPT Slide
Maximum likelihood tree reconstruction 3
PPT Slide
Maximum likelihood tree reconstruction 4
PPT Slide
Typical assumptions of ML substitution models
PPT Slide
Maximum likelihood models 1
PPT Slide
Maximum likelihood models 2
PPT Slide
A case study in phylogenetic analysis:
Deinococcus and Thermus
BUT:
A four taxon problem for Deinococcus and Thermus(Thermus, Deinococcus, Bacillus, Aquifex)
Thermus
Deinococcus
Aquifex
Bacillus
“The true tree”
PPT Slide
The Jukes and Cantor model is the simplest model
The JC model is a one parameter model
1) it assumes that all changes are equally probable (p=0.25)
2) unless modified it assumes all sites can change and that they do so at the same rate
Output of JC ML analysis for (Thermus, Deinococcus, Bacillus, Aquifex)
Tree 1 -log likelihood = 4090
Best tree
Tree 2 -log likelihood = 4101
True tree
Tree 3 -log likelihood = 4132
The Jukes and Cantor model in ML is unable to recover the true tree for this data set
PPT Slide
The 16S rRNA genes of Aquifex, Bacillus, Deinococcus and Thermus
Exclude characters command in PAUP - exclude constant sites:
Character-exclusion status changed:
859 of 1273 characters excluded
Total number of characters now excluded = 859
Number of included characters = 414
Taxon A C G T # sites
--------------------------------------------------------------
Aquifex 0.12319 0.38164 0.38164 0.11353 414
Deinococc 0.23188 0.22222 0.27295 0.27295 414
Thermus 0.13317 0.35835 0.37530 0.13317 413
Bacillus 0.23188 0.22705 0.26570 0.27536 414
--------------------------------------------------------------
Mean 0.18006 0.29728 0.32387 0.19879 413.75
Base frequencies command in PAUP:
Does the JC model fit these data?
Models can be made more parameter rich to increase their realism 1
A gamma distribution can be used to model site rate heterogeneity
Models can be made more parameter rich to increase their realism 3
JC ML tree
-4090
JC -invariable sites - 4030
JC -inv + gamma
correction for variable sites - 4029
GTR-inv + gamma
correction for variable sites - 3985
PPT Slide
The GTR model of sequence evolution:
The general time reversable model (GTR) is the most general substitution model because it assigns different rates for each type of substitution. For example for the 16S ribosomal RNA data for Deinococcus, Thermus, Aquifex and Bacillus:
Tree number 1:
-Ln likelihood = 3985.30400
Estimated R-matrix:
-2.7325625 0.4419956 1.42028 0.87028688
0.4419956 -5.2448524 1.2621698 3.540687
1.42028 1.2621698 -3.6824498 1
0.87028688 3.540687 1 -5.4109739
Estimated value of proportion of invariable sites = 0.228318
Estimated value of gamma shape parameter = 0.610459
PPT Slide
The 16S rRNA genes of Aquifex, Bacillus, Deinococcus, Thermus and Thermus ruber
Exclude characters command in PAUP - exclude constant sites:
Base frequencies command in PAUP:
Character-exclusion status changed:
837 characters excluded
Total number of characters now excluded = 837
Number of included characters = 436
Taxon A C G T # sites
--------------------------------------------------------------
ruber 0.19725 0.27294 0.29587 0.23394 436
Aquifex 0.12156 0.38073 0.38532 0.11239 436
Deinococc 0.22477 0.22936 0.28211 0.26376 436
Thermus 0.13103 0.35862 0.37931 0.13103 435
Bacillus 0.22477 0.23394 0.27523 0.26606 436
--------------------------------------------------------------
Mean 0.17990 0.29509 0.32354 0.20147 435.80
Output of GTR-inv sites ML analysis for (Deinococcus, Bacillus, Aquifex, thermus and Thermus ruber)
Tree 1 -log likelihood = 4439
Tree 2 -log likelihood = 4447
Tree 3 -log likelihood = 4437
Best tree = True tree
With the addition of Thermus ruber which has a base composition which is intermediate between thermophiles and mesophiles GTR-inv sites ML recovers the Thermus + Deinococcus relationship
PPT Slide
Estimation of ML substitution model parameters:
PPT Slide
Parameter estimates using the “tree scores” command in PAUP*
Use PAUP* tree scores to use ML to estimate over this tree:
1) Proportion of invariant sites
2) Gamma shape parameter for variable sites
3) Substitution parameters for all types of change
Maximum parsimony tree
PPT Slide
ML Parameter estimates over a parsimony tree using tree scores in PAUP*
Tree number 1:
-Ln likelihood = 4432.16903
Estimated R-matrix:
-2.992539 0.53399075 1.6835489 0.77499941
0.53399075 -6.0877637 1.0048052 4.5489678
1.6835489 1.0048052 -3.6883541 1
0.77499941 4.5489678 1 -6.3239672
Corresponding Q-matrix:
-0.77509276 0.12637319 0.52569668 0.12302289
0.11475088 -1.1506065 0.31375553 0.72210013
0.36178289 0.2377952 -0.75831742 0.15873934
0.16654196 1.0765496 0.31225508 -1.5553467
Estimated value of proportion of invariable sites = 0.302946
Estimated value of gamma shape parameter = 0.629797
These values can then be used as the starting parameters for a full likelihood search
PPT Slide
Maximum Likelihood Tree
PPT Slide
Maximum Likelihood -advantages
PPT Slide
Maximum Likelihood -disadvantages