Module 2: Evolutionary Implications of Protein and DNA Evidence

Activity 1: Mitochondrial Evolution and Endosymbiosis


According to the mitochondrial endosymbiosis hypothesis, these organelles were once free-living but parasitic bacteria (protozoa?). One lineage of cells survived the initial infection by these pre-mitochondria, and the two began to co-evolve. Eventually, the parasitic interaction developed into a tight symbiosis, such that now neither can exist without the other. (A similar hypothesis has been developed for chloroplasts.)


The goal of this project is to test the endosymbiosis hypothesis by seeing whether the distribution of mitochondria in protozoa is concordant with the protozoan phylogeny, and will give the student some appreciation of the evolutionary debate surrounding the origins of some eukaryotic organelles.


Import rRNA sequence data for species from the following six groups: amoeboflagellates, diplomonads, euglenoids, kinetoplasts, microsporans, and parabasalians. Use Clustal_W to align the sequences and to construct a phylogeny.

Do the phylogenetic relationships match the distribution of mitochondria over these groups?


Mitochondria present?













What other sources of molecular data might enable you examine to support or reject the evidence suggesting the endosymbiotic theory of organellar development in eukaryotes.

Activity 2: Relationship between HIV evolution and host progression to AIDS


As discussed in Module I, HIV evolves extensively within a human host. This evolution is thought to relate closely to how the virus actually causes AIDS. According to one theory, the antigenic diversity model, the key to understanding AIDS lies in the vir usís generation of "escape mutants," mutants that are sufficiently different from the parental virus to go temporarily unrecognized by the hostís immune system. The host may soon develop a response to this new invader, but then the virus mutates againÖand againÖand again. Eventually, the immune system collapses under the diverse virus population, and the patient develops full-blown AIDS.

If this theory is correct, the rate at which a host progresses to AIDS will depend on the amount of viral evolution and diversity within that host. We would expect patients with diverse, rapidly evolving viral populations to progress more rapidly than oth ers whose viral populations remain relatively constant.


The goal of this project is to evaluate the antigenic diversity model by testing one of its predictions, a positive correlation between HIV evolution and host progression rate.


As in Module I, access the HIV nucleotide sequence database. For this study, however, we will be using viral sequence from only one subject at a time. Import the first subjectís sequences into the Workbench, align them, and perform phylogenetic reconstruc tion using Clustal_W. Print the resulting trees.

We will now need to measure the amount of viral evolution in this subject. One such measure consists of the maximum pairwise genetic distance between sequences from that subject. To determine this, return to the phylogenetic reconstruction menu and again select Clustal_W. On the following page, under the menu "Output format," select "Distance matrix." The output should consist of a large matrix, each entry of which represents the genetic distance between two of the hostís sequences. Search the matrix for its largest entry and record this value. Then delete the sequences from the Workbench, import the next subjectís sequences, and repeat the procedure until you have a measure of the amount of evolution for each of the 15 subjects.

Subjects are often divided into three progression categories: rapid progressors, who develop AIDS within two years of initial infection; moderate progressors, who develop AIDS within approximately ten years; and nonprogressors, whose immune systems remain functional for ten years or more. Based on your findings and on your understanding of the antigenic diversity model, which of the 15 subjects do you think were rapid progressors? Which do you think were nonprogressors?

After you have made these predictions, your instructor will provide the actual progression data. How accurate were your predictions? To what extent do the data support the antigenic diversity model? Can you think of ways to improve this analysis?

For instructor only:

The actual progression categories are:

Rapid Progressors

Moderate Progressors


Subject 1

Subject 5

Subject 2

Subject 3

Subject 6

Subject 12

Subject 4

Subject 7

Subject 13

Subject 10

Subject 8


Subject 11

Subject 9


Subject 15

Subject 14


An ANOVA test reveals no significant differences among categoriesí evolutionary rates (F2,12=1.46, P=0.27). This procedure therefore does not support the antigenic diversity model.

Possible ways to improve the analysis

  1. Use protein rather than nucleotide sequences. This will eliminate from consideration synonymous substitutions undetectable by the immune system.

  2. Use a better measure of the amount of evolution within a host. The present measure, dmax, is biased by several factors. First, it omits any consideration of the time interval over which a subjectís data are gathered. Second, it may be conf ounded with N, the number of sequences sampled from a given subject. On average, a subject with many sequences sampled will yield a larger dmax than will a patient with fewer sequences. Finally, and most importantly, this measure fails to refle ct the distribution of genetic distances. Subject 12, for example, has very low overall diversity, but one sequence () is very distant from all the others. The dmax measure detects this large distance but doesnít account for all the smal l distances between the other sequences.

Student distance matrix:

Subject Number of sequences (N) Maximum pairwise genetic distance (dmax) Progression category
1 42 0.131  
2 24 0.051  
3 39 0.051  
4 47 0.083  
5 43 0.059  
6 54 0.066  
7 43 0.090  
8 49 0.074  
9 64 0.079  
10 49 0.075  
11 32 0.032  
12 37 0.066  
13 26 0.032  
14 77 0.062  
15 40 0.120  

For the instructor:

Subject Number of sequences (N) Maximum pairwise genetic distance (dmax) Progression category
1 42 0.131 Rapid
2 24 0.051 Nonprogressor
3 39 0.051 Rapid
4 47 0.083 Rapid
5 43 0.059 Moderate
6 54 0.066 Moderate
7 43 0.090 Moderate
8 49 0.074 Moderate
9 64 0.079 Moderate
10 49 0.075 Rapid
11 32 0.032 Rapid
12 37 0.066 Nonprogressor
13 26 0.032 Nonprogressor
14 77 0.062 Moderate
15 40 0.120 Rapid

Activity 3: Sodium channel proteins: a revealing story about the evolution of tissues, organs and taxa

This activity will stimulate the student to grapple with the question of which came first, the organs and tissues, or the taxa as we now know them.

This page created and maintained by Kristian N. Engelsen.
E-mail any questions or comments.