Freshman Statistics Seminar

Week 11:Mathematical Models for Prediction

Ray Dybzinski


  • Give students tools and insights so that they can start to appreciate the strengths, weaknesses, and assumptions of the mathematically- and statistically-based predictions that they will encounter in their college courses and future careers.

Article Summary:

  • Brown 2006 Wash Post “World Death Toll Of A Flu Pandemic Would Be 62 Million”

Researchers used detailed demography records from the 1919 Spanish flu pandemic to predict the consequences of such a pandemic today. In a nutshell, they determined how mortality rates depended on income and then used contemporary income and population data to predict contemporary consequences. In 1919, lower income was associated with greater flu mortality. Since developing countries have the lowest income today, they are expected to experience the greatest flu mortality from a contemporary pandemic. However, since overall income has increased, the mortality from a contemporary pandemic is not expected to be as severe as the 1919 Spanish flu, when expressed as a fraction of world population (i.e. – the absolute number of deaths may be greater today, but only because the world population is so much greater than it was in 1919).

  • Neergaard 2006 Boston Globe “911 Air Travel Dip Linked To Flu’s Delay”

Researchers analyzed air travel volume and flu’s onset, duration, and severity. The big news was that the dip in air travel caused by 9/11 was correlated with a 2-week lag in that year’s flu season. However, the lag was not correlated with any reduction in severity, and it may have caused the flu season to be prolonged. Thus, the bulk of the story warns against jumping to the conclusion that because decreased air traffic delays flu’s onset, we should restrict travel in the event of a pandemic. At best, restricting air travel might buy time to prepare for a pandemic, but that decision would have to be weighed against the economic chaos it would cause.

Suggested Lesson Structure:

  • Students should have read both articles before arriving at class.
  • Introduce students to the active learning model by walking through the “model model” described below.
  • Ask students to imagine how they would go about making the same sort of prediction found in “World Death Toll…” (see below).
  • Discuss their ideas and the article “World Death Toll…”
  • Ask students to imagine how they would go about making the same sort of prediction found in “9/11 Air Travel Dip…” (see below).
  • Discuss their ideas and the article “9/11 Air Travel Dip…”

Discussion Points:

  • For both papers, one may reasonably ask, “why bother to make such predictions?” If the students don’t ask the question, perhaps the instructor should! At one level, people are simply curious – but that seldom justifies the expense, effort, and time that go into a complicated prediction. Predictions such as those made in the articles are of interest from a policy standpoint. As the “World Death Toll…” article alludes to, the World Health Organization pays disproportionate attention to developing countries and will likely continue to do so in preparation for a flu pandemic based on the findings that were reported in the article. The policy implications for the study reported in “9/11 Air Travel Dip…” are obvious. Should the world restrict air traffic in the event of a pandemic? The study provides important information regarding the effects of such a policy decision (it would delay the flu, but not lessen its severity or duration).
  • Students should consider the bias entered into a prediction based on the math and statistics that underlie it. For instance, in the “World Death Toll…” study, the researchers did not use 1919 mortality data from every country in the world. Obvious omissions include China and all the nations of Africa, among many others. Does this bias the results? Of course, this omission was very purposeful – they only wanted to use countries that rigorously kept data. What would have been the benefit of using all the data they could have gotten their hands on? (Different regional or socioeconomic trends may have emerged that may have changed their prediction, but that may have come at the cost of messy data that could have blurred any trend at all!)

Active Learning Modules:

  1. A model model – the active learning module below asks students to envision the data they would need to collect and what they would do with it to make the predictions found in the articles. This very simple example may serve to guide that exercise. Imagine you want to predict the fraction of students in this class who will become doctors. How might you proceed? There are probably many ways, but here is one:
    1. What data should be collected/found?
      • Ideally, one would obtain data from school records that contained information regarding each student’s (1) desire to become a doctor, (2) first year GPA, (3) ACT or SAT score, and (4) high school GPA.
      • We would also need follow-up data on those students. Did they become doctors? Ideally, the school would keep track of this. Otherwise, we might be forced to look at professional registries to match up doctors with past students. Unfortunately, we would probably miss many past students who became doctors in areas that are not covered by the registries we check!
      • We would need all of the information collected in (i) for the students in this class.
    1. What should be done with the data?
      • Using statistics, we can create a model that predicts the odds that someone will become a doctor given the data collected in (i) and (ii). Then we use the same model with the data collected from this class (iii). Given the probabilities that each student will become a doctor, we can predict the number of future doctors from this class.
  1. Imagining the models described in the articles – before discussing each model, ask students to imagine what they would need to do as researchers to make the same prediction. Some thoughts and answers are provided, but we suggest that these only be given to students who are stuck. Also, students will likely come up with other, equally valid answers!
    1. What data should be collected/found?
      • Death Toll:
        • Ideally, one would obtain mortality rates from the 1919 flu based on access to food, clean water, health care, and money – this might be most easily collected at the level of nation-state, but since access to these resources differs among inhabitants of a given country, finer resolution would be better. Official registers might provide ready access to such data (that’s what the authors used), but one might also consider other, more creative ways of obtaining such data.
        • One would also need contemporary data regarding the same access to food, clean water, health care, and money from all the world’s nations. This might be easier to obtain than the 1919 data – the UN or the World Bank may have relevant data.
      • 9/11 Air Travel Dip:
        • Ideally, one would obtain data on flight volume in a given year (either sheer number of person-hours flown or some other metric that takes into account the average distance flown as well). Airports, airlines, or governmental travel bureaus (such as the FAA) would likely keep such records – or at least the raw data from which one could calculate the metrics of interest.
        • One would also need data on flu season onset, severity, and duration. Such data would likely be kept by governmental agencies (such as the CDC).
    1. What should be done with the data?
      • Death Toll: Basically, the 1919 data would allow one to create (a very complicated and statistically-derived) model that, given the inputs of access to food, water, health care, and money, would predict flu mortality rates. The same model would then be applied to contemporary access to food, water, health care, and money. The predicted flu mortality rate could then be multiplied by population size to generate a total mortality prediction. This is hard to describe and would be even harder to actually do (kudos to the researchers!).

9/11 Air Travel Dip: With flight data and flu data in hand, one would look for correlations between them. Finding the relevant window of flight data (e.g. should you consider flight volume over the entire year prior to the flu season? One month before flu season? Something else?) would be very challenging. However, once the relevant window was found, a model could be created that predicted flu season onset, severity, and duration given flight volume. To predict what would happen if flights were shut down (which is the policy decision under consideration), the same model could be used with flight volume “set” at zero.

Additional Links:

Emperical Evicence for the Effect of Airline Travel on the Inter-Regional Influenza Spread in the US

Estimation of potential global pandemic influenza mortality on the basis of vital registry data from the 1918–20 pandemic: a quantitative analysis

World Death Toll of a Pandemic Flu Would Be 62 million

9/11 air travel dip linked to flu’s delay