Exploring the relationship between sequence, structure, and evolution

Robin Dowell & Debra Goldberg

Objective: Build a framework/skeleton for a mini-project that can be used in different related, but distinct, classes.  The basic framework we outlined:

Protein sequences fold into 3D structure within the cell (See PDB).  The structure of the sequence constrains the permissible evolutionary changes.    This exercise is designed to examine and explore the relationship between 3D structure and evolutionary conservation using ConSurf.

This general work flow can all be done within ConSurf:

sequence =>  generate a MSA => score conservation => visualization

Basic Analysis Steps:

  1. 1. Begin with a sequence of interest.
    • This sequence should have a known 3D structure (e.g. in PDB) in order for subsequent visualization to work.
  2. 2. Identify related sequences:
    • Fundamentally this is a question of database (e.g. Swissprot or NR) searching.
    • Your sequence is used to search for similar sequences  based on pairwise similarity (e.g. Blast).
    • ConSurf uses PSI-Blast which iteratively applies BLAST.
  3. 3. Align the set of sequences using a multiple sequence alignment algorithm (MSA)
    • ConSurf algorithms include  MAFFT, clustalw, etc
    • Ultimately MSA depends on an underlying scoring scheme and some inference of a phylogeny (the relatedness between the sequences).
  4. 4. Score the multiple sequence alignment for conservation.
    • Scoring methods at ConSurf include Bayesian and Maximum Likelihood.
  5. 5. Visualize various relationships:
    • The primary sequences utilized
    • The phylogenetic relationship between the sequences
    • The multiple sequence alignment, colored by conservation
    • The 3D structure, colored by conservation

An Example:

YEAST APO-ENOLASE AT 2.25 ANGSTROMS RESOLUTION

YEAST APO-ENOLASE AT 2.25 ANGSTROMS RESOLUTION

So how do we adapt this for different classes?   A graduate level algorithms class or a senior undergraduate biology course might emphasize different aspects of this exercise.

Variations and Extensions:

  • Modify the number of sequences considered in the multiple sequence alignment.
  • Consider different levels of conservation given the same number of sequences.
  • Compare different multiple sequence alignment methods:
    • Select different ConSurf options
    • Use/Write your own sequence alignment method.
    • If considering a protein domain, use the PFAM manually curated alignment.
  • Compare Bayesian to Maximum likelihood scoring schemes for conservation.
  • Contemplate why some aspects of the molecule are more conserved than others.
  • Consider the problem of highlighting differences in individuals within the same species – use PyMol to visualize key differences in an alignment of very closely related sequences.
  • Prepare an entry for Protopedia (http://www.proteopedia.org) using this information.

One classroom presentation would be to lead with an example (An excellent example by Team Volunteers) and then give the above as a handout with details specific to the class objectives.

This entry was posted in Workshop Project. Bookmark the permalink.

Comments are closed.