|
For the current Biology Student Workbench Project please visit BioQUEST/bioinformatics or Biology Student Workbench Off-site linksóthose that require an Internet connectionóare highlighted (e.g., BioQUEST.org). This exercise is also available from the author at the GenWeb site. Return to the BSW Module Index |
Alignment
by Scott Cooper (University of WisconsinóLa Crosse, cooper@mail.uwlax.edu)
An alignment program is used to compare the sequence homology between two protein or DNA sequences. These programs find the best match between the two sequences. Occasionally gaps need to be introduced to make the two sequences align.
seq1 > 1 ggcctctgcctaatcacacagat-ctaacaggattatttc
||||||||||| || ||||| || |||||||| ||||||
seq2 > 1 ggcctctgccttattacacaaatcttaacaggactatttc
This type of analysis is useful in detecting evolutionary differences between species and to look for mutations in genes.
We can either use the program BLAST
directly to align two sequences, a Multiple
Alignment Program or use the Biology WorkBench
Using Biology WorkBench to align two or more sequences.
Log onto the Biology
WorkBench, (also see Scott
Cooper's information about the Biology WorkBench)
and either create a new session or resume an
existing session. Select Nucleic Tools or
Protein
Tools.
Select the sequences that you wish to align and then scroll down to CLUSTALW.
You will now be given some options on parameters you can change in your alignment. You can just use the default values and select Submit at the bottom of the page.
The results will show the two sequences with colored letters representing a consensus. Black letters will illustrate a mismatch and dashes will represent gaps.
If you wish to use this alignment to create a phylogenetic tree, or
just want to save the alignment, select Import Alignment(s).
1. The following program allows you to align two sequences together.
http://www.ncbi.nlm.nih.gov/gorf/bl2.html
2. Either
copy and paste or type your two sequences into the boxes labeled Sequence
1 and
Sequence 2. Be sure you do not have any text or numbers mixed in with your
sequence.
To give each sequence a title type an ">" followed by the title and then
"enter".
The information on the line with the ">" will not be considered in the
alignment.
>sequence 1
CCTTGGCCTCTGCCTAATCACACAGATT
3. If you
are aligning DNA sequences select Program: blastn, if you are aligning
protein
sequences select Program: blastp.
4. Press
the Align button and wait for your results.
RESULTS
If you align DNA sequences, vertical lines will indicate identical bases and "-" will indicate gaps in the alignment.
seq1 > 1 ggcctctgcctaatcacacagat-ctaacaggattatttc
||||||||||| || ||||| || |||||||| ||||||
seq2 > 1 ggcctctgccttattacacaaatcttaacaggactatttc
If you align protein sequences the output will show the identical amino acids lined up between the two sequences. A blank will appear at non-conservative substitutions and a "+" will appear at conservative substitutions. A "-" will indicate any gaps in the alignment.
seq1 1 KKLYPATTA-VSSQQVV 16
KKLYPA+TA VSS QVV
seq2 1 KKLYPASTAVVSSNQVV 17
1. To align several sequences the following Multiple SequenceAlignment program is useful. All of the sequences are entered in the same data box, with titles separating each sequence.
http://dot.imgen.bcm.tmc.edu:9331/multi-align/Options/map.html
2. Your data should be formatted as follows:
>seq1
ggcctctgcctaatcacacagatctaacaggattatttc
>seq2
ggcctctgccttattacacaaatcttaacaggactatttc
>seq3
ggcctctgccttattttctttacaggactatatc
3. Perform Search
RESULTS
The results from the multiple alignment will be given in two formats. The second is a FASTA format that can be pasted into some evolutionary programs. A " - " indicates a gap in the alignment of the two sequences.
1 15
16 30 31
45
1 seq3 GGCCTCTGCCTTATT T------TCTTTACA GGACTATATC 34
2 seq1 GGCCTCTGCCTAATC ACACAGATCT-AACA GGATTATTTC 39
3 seq2 GGCCTCTGCCTTATT ACACAAATCTTAACA GGACTATTTC 40
>seq3
GGCCTCTGCCTTATTT------TCTTTACAGGACTATATC
>seq1
GGCCTCTGCCTAATCACACAGATCT-AACAGGATTATTTC
>seq2
GGCCTCTGCCTTATTACACAAATCTTAACAGGACTATTTC