Module 1: An introduction to the Biology WorkBench—searches, alignments and trees

Activity 1:


Until recently all organisms were divided into one of two domains: (1) the eukaryotes, which included organisms whose cells contain a well-formed nucleus; and (2) the prokaryotes, which are unicellular organisms who lack a nucleus. In recent years there h as been a fundamental revision of this picture. Among the bacteria there seems to be a distinct third domain. Phenotypically these bacteria look like normal bacteria but they seem to have a distinct phylogenetic history. This new domain of organisms is na med Archaebacteria. The name reflects an untested conjecture about their evolutionary status. The phylogenetic evidence suggests that the Archaebacteria are at least as old as the other major domains; hence, it now seems possible that the newest group of organisms is actually the oldest. It is important to note that not all scientists agree with the three domain scheme.

The new technology of sequencing nucleic acids and proteins enables biologists to uncover the pattern of organic evolution through geologic time since these molecules can be considered as molecular fossils. Looking for differences in the sequences and dis covering patterns in the degree of difference is an indicator of how long ago in the past any two organisms may have shared a common ancestor.


Investigate whether the phylogenetic trees and the distance matrix table support the two or three domain concept.


In this activity, students will examine and compare the amino acid sequence of an enzyme called enolase. We find enolase involved in the last stage of glycolysis during which 3-phosphoglycerate is converted into pyruvate and a second molecule of ATP forme d. The students will use the Biology Workbench Web-based software (NCSA, 1999) for this analysis. The students will compare six organisms using SwissProt database and search for enolase amino acid sequence from the following organisms.

Species name


Methanococcus jannarchii


Escherichia coli

Gram negative bacterium

Bacillus subtilis

Gram positive bacterium

Drosophila melanogaster


Homo sapiens


Saccharomyces cerevisa (yeast)


Questions and answers (under construction)

Activity 2: Patterns of HIV Relatedness between Hosts


HIV, the virus that causes AIDS, currently infects over 30 million people worldwide, including over 800,000 in the United States. This virus evolves very quickly within its human host, at a rate approximately one million times that of most vertebrates. As a result, we observe large amounts of genetic variation in the population of viruses both within and between hosts. In several investigations, this variation was used to identify the probable source of a new infection.


The long-term goal of this project is to introduce students to phylogenetic problem solving and to familiarize them with the Biology Workbench software. The short-term aim is to study relationships within and between a small population of hosts.


HIV nucleotide data from 15 subjects is available through the SRS database search in Biology Workbench. Under "Nucleotide Tools," choose the GenBank viral database and enter the search parameters: "HIV and env". The relevant sequences are numbers A F016760-1 ¾ AF016825-1 and AF089109-1 ¾ AF089708-1. Biology Workbench cannot handle this many sequences simultaneously, so you will need to choose a subsample from each subject. A subsample size of five, for an overall sample size of 75 sequences, is recommended. Choose these sequences randomly from each subject and import them into the Workbench.

You will now need to align the viral sequences. Use Clustal_W to do this (the default options are fine). Then use the Clustal_W phylogenetic reconstruction procedure to determine patterns of relatedness both within and between subjects. What patterns do y ou see? How might you explain them?

For the instructor only:

Most subjects’ viral sequences form a monophyletic clade; that is, a grouping in which each sequence from that subject is more closely related to every other sequence from the same subject than it is to any sequence from another subject. The exceptions are Subjects 1 and 2, whose viral sequences are intermingled. (One of Subject 1’s sequences actually groups with Subject 11; this pattern does not appear when more powerful phylogenetic programs such as PAUP are used, and is probably anomalous .)

The logical conclusion from this tree is that subjects’ viral populations form diverse but non-overlapping arrays, suggesting that each subject was initially infected with a small and homogeneous viral sample. Subjects 1 and 2, the exceptions, are a roman tic couple, and are therefore known to be epidemiologically linked. Presumably, viral sequences have been transferred between them in one or both directions.

This page created and maintained by Kristian N. Engelsen.
E-mail any questions or comments.