Molecular Evolution - Final Exam
The data set you will analyze in this micro project consists of genes encoding the cytochrome c oxidase sub-unit 1 protein from a range of eukaryotic species: Cytochrome c dataset
Specifically the data set contains the following sequences (I have indicated the taxonomic class for each sequence):
- Mammals: Human, Bovine (cattle), Mouse, Rat, Seal, Whale
- Ray-finned fishes: Carp
- Birds: Chicken
- Amphibians: Xenopus
It is the last enzyme in the respiratory electron transport chain of mitochondria (or bacteria) located in the mitochondrial (or bacterial) membrane. It receives an electron from each of four cytochrome c molecules, and transfers them to one oxygen molecule, converting molecular oxygen to two molecules of water. In the process, it binds four protons from the inner aqueous phase to make water, and in addition translocates four protons across the membrane, helping to establish a transmembrane difference of proton electrochemical potential that the ATP synthase then uses to synthesize ATP.
Using the methods and tools you learned in class you should complete the tasks below. All results (including plots of trees) should be reported in the form of a micro-report (Word/RTF/PDF document or similar) which you will hand in electronically at CampusNet. Please include your data set at the end of the mini report (in fasta format). For each step you should describe how you solved the problem, including the exact commands used, and provide arguments for why you did it in the way you did.
- Align the sequences. Explain why you chose the particular method you did.
- Convert alignment to suitable file format(s).
- Select an outgroup. Argue for your choice.
- Construct rooted (on your chosen outgroup) phylogenetic trees using the following five methods. If a method results in more than one optimal tree, then construct a consensus tree. Include a plot of each of the 5 rooted trees in your report.
- Distance based method, with optimality criterion, with JC correction
- Distance based clustering method, with JC correction
- Maximum Likelihood (determine best model before constructing tree)
- Bayesian (bonus points if you can figure out how to use same model as for maximum likelihood)
- Make an analysis of selection on the data set. Which model fits the data better? (Explain how to compute this). According to the chosen model: what fraction of sites is under positive selection? Negative selection? What fraction evolves neutrally?
When you are done, please hand in the answer using the electronic system at CampusNet: On the course CampusNet page, go to assignments, and under the header "Assessments", choose "Final exam". Click "hand in" and select the file(s) you want to submit.