Freshman Statistics Seminar
Week 12:Sampling the Sub-Populations
The purpose of this section is to introduce basic principles of sub-sampling larger populations, ways in which these subsamples might be taken, and reasons why this would be important in medical research.
Kent, D and R Hayward. 2007. When Averages Hide Individual Differences in Clinical Trials. Amer Sci. 95:60-68.
This article gives a nice overview of how averages can misrepresent the effects for any given group of patients.
Rathore, SS and HM Krumholz. 2003. Race, ethnic group, and clinical research. BMJ 327:763-764.
This is a very brief editorial detailing the ethical and the scientific issues with incorporating race or ethnic groups into analyses of clinical trials.
Tate, SK, C Depondt, SM Sisodiya, GL Cavalleri, S Schorge, N Soranzo, M Thom, A Sen, SD Shorvon, JW Sander, NW Wood, & DB Goldstein. 2005. Genetic predictors of the maximum doses patients receive during clinical use of the anti-epileptic drugs carbemazepine and phenytoin. Proc. Natl. Acad. Sci. 102:5507-5512.
This is a short article illustrating how gene differences can affect how effective different drugs are for a given condition. This article sets up the active learning module.
Potential replacement for Paper 3:
Hampton, T. 2005. Gene variants explain patient differences in antiepileptic drug responses. JAMA 293: 2199.
This is a one page new article that summarizes the findings of Tate et al.
Suggested Lesson Structure:
Before class, students will read three papers addressing how disease and treatments for smaller subsets of the population can have different effects than on the population as a whole.
Part I. Discuss and Critique the Papers
Part II. Simulation of Sampling Sub-Populations for Drug Efficacy.
Students can do this activity alone, or in small groups of two to 3 students. See the active learning module for details.
Part III. Follow-up Discussion
Discuss the active learning module: the students can share their findings and their recommendations.
You could choose any number of ways to subsample your data: sex, age, birth month, etc. How might you go about choosing which way is more relevant?
If you can slice your data in hundreds of ways, how might you be able to tell if the effect you are sub-sampling for is a real phenomenon, or is an chance artifact of the number of ways you’ve split the data?
What sorts of ethical problems might you face as a researcher looking for subtle effects in diverse patient populations?
Active Learning Module:
The simulation in Excel provides a simulated set of data for 150 patients in a drug trial. Part 1 lists the efficacy values for all patients; Part 2 lists the same data, but also gives additional information for each patient. Students can subsample the data by sorting by the additional information and calculate means for each.
Here’s the graph of the means by each class.
Download Simulation Data for Sampling Sub-populations: