Phylogenetic methods - computer practicals
Parsimony
Maximum likelihood
Bayesian inference
Species tree
home
Department of Botany, Charles University, Prague (2021-2024)
Bayesian analysis
Basic comments to MrBayes are here.
Comments to jModelTest 2 are here.
Notes to Newick tree format and DNA evolutionary models are
here.
Training dataset is here. It consists of
alignments of eight samples from the ginger family (Zingiberaceae) published by
de Boer et al. 2018.
There are ITS, matK and concatenated alignments in NEXUS format.
A ZIP file with necessary software is here.
Download it and unzip.
MrBayes -
program for Bayesian inference and model choice of phylogenetic and evolutionary
models
jModelTest 2
- program for selection of best-fit models of DNA evolution
FigTree
- phylogenetic tree viewer
Other software recommended to install
Tasks (to work
with Amomum dataset)
after you are done submit the answers using
this Google form
If an advice is required, contact Tomá Fér: tomas.fer_AT_natur.cuni.cz
1. Select the optimal evolutionary models for ITS and matK datasets
using jModelTest 2 and AICc criterion
Question 1: What is the best model for ITS
dataset?
Question 2: What is the best model for
matK
dataset?
2. Create Bayesian tree based on ITS dataset
put the matrix file to the folder with MrBayes binary (exe file)
double click on the MrBayes binary (i.e., mb.3.2.7-win64.exe), a black window appears
type execute command followed by the file name in NEXUS format, e.g. execute Amomum_ITS.trimmed.nex
set appropriate model (here K2P+I, i.e., two rate
parameters and equal base frequencies, plus proportion of invariable sites)
- but USE BEST MODEL ESTIMATED BY YOU, see the document with MrBayes
commands how to create the commands prset
and lset)
lset nst=2 rates=propinv
prset statefreqpr=fixed(equal)
check the actual model by typing
showmodel
Question 3: How many different parameters
there are?
set run parameters (2 independent runs with 4 chains
each, 100 thousand generation, sampling every 100th generation)
mcmcp nruns=2 nchains=4 ngen=100000
samplefreq=100
run the analysis
mcmc
summarize the analysis (i.e., discard first 25%, i.e.
250, steps as burn-in; in total there were 100,000 / 100 = 1,000
generations sampled)
sump burnin=250
summarize the trees (again, discard burn-in fraction,
then create a consensus tree)
sumt burnin=250 contype=Allcompat
check the screen results and the resulting files
Question 4: What is Average standard deviation of
split frequencies?
Question 5: What is the lowest ESS value?
Question 6: What is the harmonic mean of the fit?
(check 'ALL' line in *.lstat file)
open the resulting tree file in, e.g., FigTree, root
with 'Siphonochilus' and display support values (prob)
Question 7: How many branches are not fully
supported?
repeat the analysis of ITS
dataset with the simple JC model (i.e., one
rate parameter and equal base frequencies)
Question 8: What is the harmonic mean of the fit?
3. Create Bayesian tree based on matK dataset
repeat the steps from Task2, set the appropriate model for matK dataset (estimated by jModelTest 2)
open the resulting tree in FigTree
Question 9: What is the most basal taxon after
the outgroup?
Question 10: Is it different from ITS based tree?
4. Create Bayesian tree based on concatenated dataset (single partition)
5. Create Bayesian tree based on concatenated partitioned dataset
set characters per partition
charset ITS = 1-593
charset matK = 594-3154
define partitions
partition twoparts = 2:ITS,matK
apply partition settings
set partition = twoparts
set models for each partition (here K2P+I for ITS, HKY+G for
matK
- but USE BEST MODELS ESTIMATED BY YOU, see the document with MrBayes
commands how to create the commands prset
and lset)
lset applyto=(1) nst=2 rates=propinv
prset applyto=(1) statefreqpr=fixed(equal)
lset applyto=(2) nst=2 rates=gamma
set variable rates among partitions
prset ratepr=variable
unlink the parameters for each partition (but
keep 'topology' and 'brlens' linked - why?)
unlink statefreq=(all) revmat=(all) tratio=(all)
shape=(all) pinvar=(all)
check the model
showmodel
Question 12: How many parameters are here?
set the mcmc parameters similarly to Task2 and run the analysis, then summarize it
open the resulting tree
Question 13: How many branches are not fully
supported (i.e., have lower support than 1.00)?
6. Compare the model fits using a Bayes factor
Check also this overview of all analyses to be done with MrBayes within this task.
Thank you for participating in MrBayes practicals...