You are currently viewing an archived version of this material. 
For the current Biology Student Workbench Project please visit 
BioQUEST/bioinformatics or Biology Student Workbench

Off-site linksóthose that require an Internet connectionóare highlighted (e.g., BioQUEST.org).

This exercise is also available from the author at the GenWeb site.

Return to the BSW Module Index

Alignment

by Scott Cooper (University of WisconsinóLa Crosse, cooper@mail.uwlax.edu)

An alignment program is used to compare the sequence homology between two protein or DNA sequences.  These programs find the best match between the two sequences. Occasionally gaps need to be introduced to make the two sequences align.

       seq1 > 1 ggcctctgcctaatcacacagat-ctaacaggattatttc
                ||||||||||| || ||||| ||  |||||||| ||||||
       seq2 > 1 ggcctctgccttattacacaaatcttaacaggactatttc

This type of analysis is useful in detecting evolutionary differences between species and to look for mutations in genes.

We can either use the program BLAST directly to align two sequences, a Multiple Alignment Program or use the Biology WorkBench
 
 

Using Biology WorkBench to align two or more sequences.

Log onto the Biology WorkBench, (also see Scott Cooper's information about the Biology WorkBench)
and either create a new session or resume an existing session.  Select  Nucleic Tools or Protein Tools.

Select the sequences that you wish to align and then scroll down to CLUSTALW.

Clust1

You will now be given some options on parameters you can change in your alignment.  You can just use the default values and select Submit at the bottom of the page.

The results will show the two sequences with colored letters representing a consensus.  Black letters will illustrate a mismatch and dashes will represent gaps.

Align
 

If you wish to use this alignment to create a phylogenetic tree, or just want to save the alignment, select Import Alignment(s).
 
 
 
 

ALIGNMENT OF TWO SEQUENCES

         1.  The following program allows you to align two sequences together.

                        http://www.ncbi.nlm.nih.gov/gorf/bl2.html

          2.  Either copy and paste or type your two sequences into the boxes labeled Sequence 1 and
               Sequence 2. Be sure you do not have any text or numbers mixed in with your sequence.

                    To give each sequence a title type an ">" followed by the title and then "enter".
                    The information on the line with the ">" will not be considered in the alignment.

              >sequence 1
              CCTTGGCCTCTGCCTAATCACACAGATT

          3.  If you are aligning DNA sequences select Program: blastn, if you are aligning protein
               sequences select Program: blastp.

          4.  Press the Align button and wait for your results.
 

RESULTS

If you align DNA sequences, vertical lines will indicate identical bases and "-" will indicate gaps in the alignment.

               seq1 > 1 ggcctctgcctaatcacacagat-ctaacaggattatttc
                        ||||||||||| || ||||| ||  |||||||| ||||||
               seq2 > 1 ggcctctgccttattacacaaatcttaacaggactatttc

If you align protein sequences the output will show the identical amino acids lined up between the two sequences. A blank will appear at non-conservative substitutions and a "+" will appear at conservative substitutions. A "-" will indicate any gaps in the alignment.

               seq1 1 KKLYPATTA-VSSQQVV 16
                      KKLYPA+TA VSS QVV
               seq2 1 KKLYPASTAVVSSNQVV 17
 
 

MULTIPLE ALIGNMENTS

1.  To align several sequences the following  Multiple SequenceAlignment  program is useful.  All of the sequences are entered in the same data box, with titles separating each sequence.

                     http://dot.imgen.bcm.tmc.edu:9331/multi-align/Options/map.html

2.  Your data should be formatted as follows:

               >seq1
               ggcctctgcctaatcacacagatctaacaggattatttc
               >seq2
               ggcctctgccttattacacaaatcttaacaggactatttc
               >seq3
               ggcctctgccttattttctttacaggactatatc

3.  Perform Search
 
 

RESULTS

The results from the multiple alignment will be given in two formats.  The second is a FASTA format that can be pasted into some evolutionary programs.  A " - " indicates a gap in the alignment of the two sequences.

                1            15 16           30 31           45
         1 seq3 GGCCTCTGCCTTATT T------TCTTTACA GGACTATATC   34
         2 seq1 GGCCTCTGCCTAATC ACACAGATCT-AACA GGATTATTTC   39
         3 seq2 GGCCTCTGCCTTATT ACACAAATCTTAACA GGACTATTTC   40
 

          >seq3
          GGCCTCTGCCTTATTT------TCTTTACAGGACTATATC
          >seq1
          GGCCTCTGCCTAATCACACAGATCT-AACAGGATTATTTC
          >seq2
          GGCCTCTGCCTTATTACACAAATCTTAACAGGACTATTTC