# Freshman Statistics Seminar

## Week 2: Basic StatisticsRay Dybzinski

#### Objective:

• Introduce students to basic statistical terms and concepts
• Have students create a “statistics crib sheet” that they can refer to when they read future articles (see example below)
• If your school has the infrastructure for it, students can create a “wiki statistics crib sheet”

#### Article Summary:

• There is no article for this week.

#### Suggested Lesson Structure:

• The HHMI-StatsPrimer.xls should take up the entire class period.

#### Discussion Points:

(to be discussed as appropriate during the HHMI-StatsPrimer.xls exercise)

• Come up with example populations that are fanciful, biological, and medical.
• The “sampling” in the exercise is totally artificial – using the example populations that the students chose, discuss what would be involved in actual sampling.
• If the inferences that you make about the population are more accurate with larger sample sizes, why doesn’t every study use large sample sizes?
• The difference between the median and the mean – students can change one of their samples to a “mistake” by typing in an outlier value. With even a modest sample size, the median is hardly affected.
• Others mentioned below.

#### Active Learning Modules:

HHMI-StatsPrimer
Ideally, you will have the “HHMI-StatsPrimer.xls” Excel spreadsheet open on a computer with projection so that the entire class can see what you are doing, and each student or pair of students will be at their own computer with the spreadsheet. Note: Excel has the capability to make actual histograms, but unfortunately they are not interactive. Thus, we have opted for points where histogram bars ought to be.

1. Have students imagine that there are only 100 of something in the world. Suggestions would be members of an endangered species, world-class wrestlers, multi-billionaires, etc. Each group of students at a computer can come up with their own “population”. It can be as real or quirky as they desire.
2. Tell students that they are now omniscient and can, by only using the power of their minds, measure some quantifiable trait on each member of this population. Suggestions might be wing-span, number of lifetime broken bones, number of cars possessed, etc. Ask them to fill in the light-blue cells with a value for each member of their imagined population. Tell them that the important thing is that they go fast (don’t think too hard about each one!) and that each member tends to have a different value that hovers around some imagined average. Some repeats are fine.
3. Tell students that as omniscient beings, they have taken measurements of the entire population, and discuss what the term population means: “The entire group of individuals that we want information about is the population”. Students should write this down for future reference.
4. Discuss the term random variable (and that they created one in light blue) and write down its definition. What is randomness? Given the populations that the students chose, what would be some real reasons for the randomness that they made up? For example, randomness in bird wing-span might be attributable to genetics or nutrition.
5. Discuss the population distribution, mean, and median and write down associated definitions. You may wish to have students draw their distributions on the board. Did anyone create a random variable with a very different mean and median? If so, have students discuss.
6. OK, up until now, everything that they’ve done has been at the population level. As researchers, this is ultimately what we’re interested in, but since we are not omniscient, we must take samples and make inferences about the population. Students can “sample” from their population by typing “=” in the sample column and then clicking the individual measurement adjacent to it. Tie this back to reality by reminding them that they have effectively plucked that individual from the population, measured its trait, and thrown it back. Students can sample other individuals by copying and then pasting their first sample to the cells below (Excel shifts the cell references appropriately).
7. Show students how to take a different sample by randomly shuffling the population values. Highlight columns A and B, select from the menu “Data à Sort”. Make sure that the radio button for “header row” is selected and sort by “random number sorter”.
8. Discuss sample size and have students experiment with different sample sizes. The big question is: how does sample size affect their ability to make inferences about the population mean/median from the sample mean/median. Remind them that if they were actually out collecting data, they wouldn’t be able to see the population distribution and parameters! The nice thing about this exercise is that they get to play the role of omniscient overseer and finite researcher at the same time.

#### Sample “Crib Sheet” – definitions taken from Introduction to the Practice of Statistics by David S. Moore & George P. McCabe

Individuals are the objects described by a set of data. They may be people, but they may also be animals or things.
A Random Variable is any characteristic of an individual. A random variable can take different values for different individuals.
The Distribution of a random variable tells us what values it takes and how often it takes those values.
An Outlier is an individual value that falls outside the overall distribution pattern.

The Mean is the average value.
The Median is the midpoint of a distribution, such that half of the data have values that are lower than the median and half have values that are higher than the median.
Two variables measured on the same individuals are Correlated if some values of one variable then to occur more often with some values of the second variable than with other values of that variable.

The entire group of individuals that we want information about is called the Population.
A Sample is a part of the population that we actually examine in order to gather information.