Ramping Up to Biology Student Workbench:

A Multi-Stage Approach to Bioinformatics Education


Educational possibilities abound in the rapidly expanding field of bioinformatics. The recent explosion of publicly available molecular sequence and structure data and on-line tools to analyze those data provides new opportunities to engage students in realistic biological problem solving. Although the technical requirements for accessing the molecular databases and analysis tools are only an internet connection away, meaningful use of these resources depends on a set of conceptual, procedural and technological understandings.


We have been involved in bioinformatics curriculum development projects aimed at bringing bioinformatics to several audiences, including undergraduate biology majors, pre-service biology teachers, and those who prepare them. We have operated with the belief that meaningful learning can be promoted within carefully structured, deliberately sequenced problem spaces that give students the opportunity to pursue research without becoming awash in the technical details of this dynamic and emerging field.


In this paper we present both a general approach to introducing students to bioinformatics problem solving and a specific instantiation of the general approach. We present our notion of progressive "problem spaces" with a three-step approach to engaging students (and faculty) with realistic research and problem solving using bioinformatics data and tools. This approach is exemplified with a set of activities that go from a basic orientation to bioinformatics to open-ended investigations of HIV evolution.

Context Setting

Seven Scenarios

An example of a context-setting activity is Parents, Police, Patents, Privacy, Patients, Profit, and Peanuts, a set of seven scenarios that is distributed and discussed within small groups, and then with a larger group or class. All seven are related to bioinformatics and biotechnology.


One scenario, Police, deals with forensics. This scenario is described on cards each containing one of the following statements:

Another of the scenarios, Patients, concerns a medical situation.

Yet another is Peanuts, about genetically modified organisms finding their way into the human diet.

Once assembled, each person in the small group reads his or her card to the group, and then the group discusses the scenario, considering these three questions:

  1. Is there any information about the scenario that you wish you had or that you felt was missing? In other words, was there enough information to consider?
  2. What issues (philosophical, historical, political, scientific, ethical) arise in discussion of this scenario?
  3. What kind of research or investigation would you consider doing based on this scenario?

Finally, the groups read their sentences to the large group and then one member of the scenario reports on the small group discussion, usually referring to the three questions. The discussions are lively, with people sharing relevant experiences, knowledge and information. By the time the activity is over, the instructors have tapped into and become aware of the students' background knowledge, and the context for studying bioinformatics and for engaging in bioinformatics activities has been well set.

Concept and Skill Establishment

Is He Guilty?: An Introduction to Working with Sequence Data and Analysis

This activity is designed to initiate familiarity with the types of data, techniques, and representations that are used in sequence analysis. Working through a small problem in a series of steps is a fruitful way to introduce students and faculty to the ideas behind sequence analysis and makes it possible for them to apply these ideas to new contexts and problems. In this example we examine the use of HIV sequence data to establish links between a Florida dentist and his patients. Generally, the activity is structured to have groups of students work with a series of printouts of data and analyses to determine whether they think the dentist was the source of the HIV infection for the patients. After introducing the overarching question (about the role of the dentist in infecting his patients) each set of handouts generates a set of more specific questions related to understanding the information contained in the sequence data and analyses and drawing inferences from them.

Part 1: Taking a look at sequence data

We begin with raw amino acid sequence data from HIV collected from six patients, the dentist, a local control and an outgroup (see fig). Often, we also distribute a printout of a paper reporting on the analysis and one of the GeneBank records. Groups are given time to assess evidence as to whether the dentist was the source of his patients' HIV.


We also give them a few more specific questions to consider:

They are encouraged to keep a list of questions that arise during their discussion.


When we come back together for a discussion of what the groups have learned, there are a wealth of good ideas to discuss. There are specific questions about how to interpret the data (e.g., What do the letters mean?, What is a local control?), while others focus on making comparisons of the sequences. Many identify patterns and it is interesting to watch a shared language develop to describe what groups have seen.

Part 2: Interpreting a multiple sequence alignment (MSA):

For the next round of group work, we mention that one of the techniques for comparing sequences involves doing a multiple sequence alignment. We hand out an alignment of the sequences that they have been working with (see fig). We also provide them with information about the pairwise comparisons between sequences (see fig). Again they work in their groups, and we prime their discussions with the following questions:

The ensuing discussions provide great teachable moments. Groups that we have worked with have brought up the common origins of sequences as a source for their similarity, the notion of mutation "hot spots" conservation of sequence for conservation of structure and function, and the similarities and differences between groups of amino acids. Still, even with MSA and pairwise comparisons, it is difficult for students to argue effectively for role of the dentist in transmitting HIV to certain patients.

Part 3: Reading trees:

In the final round of this activity we provide each group with an unrooted tree generated by .... built to represent the data that they have been working with. They have another opportunity to work with their group an address the following questions:

After this final discussion we find that students have developed many of the following competencies:

Competency Development

Exploring HIV Evolution: An Opportunity to Do Your Own Research

This third phase builds from the previous activity to engage students in investigations of their own questions using molecular data. This exercise is very open, in that it provides students opportunities to make decisions and develop their own research strategies, but it is not unstructured. In this situation, we work from a published data set. This simplifies certain things (students don't need to search for sequences or decide if particular sequences are appropriate to compare) and limits the range of questions that can be addressed (the data set will lend itself to certain types of analyses and not be appropriate for others).


This activity involves orienting students to the biology, the analysis tools and the data set. The previous activity is used to orient students to the tools. We talk some about the biology of HIV, emphasizing information that will become pertinent with respect to taking advantage of the data set that is available. In this case we use a collection of 666 sequences from 15 HIV+ patients. We get the groups started by handing out a summary table of the data that are available and a brief discussion of how the data were collected (see table). The groups are asked to look over the data table (no sequence data yet) to look for interesting patterns, and think about possible research questions. As a whole class, we then brainstorm some possible research ideas, which accomplishes several objectives. We get some concrete ideas in the air, further orient students to the data available, link the data to what we know about HIV biology, and illustrate the range of potentially fruitful investigations that one could undertake.


As a next step we select one or two questions and model what the students will be asked to do in their own investigations. This involves narrowing from the general question, "Is there a particular change in the HIV sequence that causes the T-cell count to drop?" We generate some specific ideas that could begin to answer the question, for example, the need to compare sequences from individuals who did not have T-cell count drops to those who did, or maybe to compare the sequences early to those later. We also have discussions about how ideas can be operationalized, e.g., when will we consider that something is similar? It is clear from this discussion that there are a variety of decisions that need to be made in order to make progress on any of the proposed questions.


Next, we ask the groups to work together at their tables to begin defining their research question and methods. Depending on the setting, during this time we may introduce students to the Biology WorkBench to show them the mechanics of choosing sequences and running analyses. We work to build in lots of opportunities for feedback and peer review by getting groups to share their preliminary results with one another. We encourage groups to print their findings and bring them back to the conference room where they have plenty of table space to lay them out and consider the results in light of their research questions.


We ask students to prepare posters and hold a research meeting at the end of this unit. This enables us to see how engaged students become with each other's research. Because they are all working on different parts of the same data set and they have all struggled with the same conceptual issues, they have become a real research community.


Conceptual, procedural and technological understandings are overlapping, dynamic and fluid categories. Let us use our example to describe these categories. By context, we mean understanding that biology, bioinformatics and Biology WorkBench exist in the "real world", how, as well as to whom they matter, and why.


Concepts include what is most often considered science knowledge, and includes large biological ideas such as inheritance, evolution, genetics, mutation and somewhat more specific biological notions such as DNA, transcription, translation, replication, amino acids, and protein synthesis, etc. More specific bioinformatics concepts include knowledge of specific molecular databases and sequencing and other analysis tools.


Procedural knowledge includes general scientific procedures, such as those associated with collaborative inquiry (Bruce) and problem solving (Jungck and Stewart), but also specific procedures, such as multiple sequence alignment and analysis, and gel electrophesis.


Technological understandings likewise extend from the general to the specific, including the use of computers and basic (software) tools for a variety of tasks, such as word processing and internet searching. Specific technological understandings include knowledge and skills associated with bioinformatics technology, both those using computers as a central tool (for example, molecular database searching, sequence selection and retrieval, and subsequent analysis), and other tools such as wet lab apparatus.


As we prepare and teach bioinformatics curriculum, we include all of the above components, through a constellation of activities and experiences that highlight different components at different times in a variety of ways. In the first stage, we begin with an activity that foregrounds contextual understanding, as it draws on the others. There is opportunity for communication about and assessment of students' existing conceptual knowledge. There is procedural practice, as students work in groups to think and talk through possible problems to pose and pursue. Also through discussion, students develop an awareness of technological possibilities.


In the second stage, students confront various representations of data, practice asking biological and procedural questions, and experience and develop fundamental technical knowledge of the basic processes on which bioinformatics is based, such as sequence comparison and analysis and tree building (and their interpretation).


In the third stage of our model, the students are introduced to, use, and develop skill with powerful and specific technological bioinformatics tools, most notably Biology WorkBench. They use these tools within an authentic research and content context, they use and build their conceptual and biology content knowledge, and they engage in real biological inquiry. In other words, they are getting BtrainingB in technology, but it is as researchers who can direct the technology to meet needs that they identify, rather than as technicians, who perform tasks to meet needs that others identify. There are powerful differences between the two.


In the process of designing and field testing curriculum materials, we have developed a three-step approach to bioinformatics literacy: (1) context setting, (2) introduction of concepts, processes, and goals, and (3) development of competent use of tools. We have presented examples from and discussion of this curricular and pedagogical model here. In the process, we have considered aspects of the model in some detail.


We work with real data and scenarios - this however does not mean that the problem space has not been carefully constructed. We work hard to foreground the types of discussions that we think are important to get everyone oriented to the field of bioinformatics, including: