Robin's Hypothesis Testing

Age 16 to 18

Challenge Level Yellow star

Why do this problem?

This problem is designed to help students understand the meaning of hypothesis tests, and in particular why it is necessary to fully specify the experiment - in particular, the sample size - before we begin, otherwise our results may be meaningless. There is an important technique called sequential testing which allows one to stop an experiment early while the results remain valid, but significant care must be taken in this situation, as shown by this resource. (Bayesian inference has an alternative approach to this, but that is another story entirely.)

In this resource, we use a binomial test, but the principles are more generally applicable. The solution section provides a more detailed explanation of these ideas.

Possible approach

Students would benefit from having some exposure to hypothesis testing before looking at this simulation. It would also be very helpful for them to have access to the simulation themselves so that they can explore it.

To put the problem in a real-world context as opposed to picking balls from a bag, you could ask students to suggest real-life contexts where we would want to or have to limit the number of trials in an experiment. For example, we could be doing laboratory experiments, and all of the materials involved are expensive. Or we might be trialling a new drug, and it costs a large amount to test it on a person, or there are only a limited number of people with the condition the drug is designed to treat. It might be that this is an experiment on animals, and we wish to limit the number of animals we are working with for ethical reasons. Another reason (which is related to the cost reason) is that each trial takes a large amount of time, perhaps a day or two, so it is not feasible to do very large numbers of trials.

You could then explain that Robin, the experimenter, has suggested a way of saving money, as described in the problem. Your students, as budding statisticians, will need to consider Robin's proposed method, and explain why it is good and will save money, or why it is broken and will potentially give a misleading answer.

Students may require guidance as to how to use the simulation. For example, they could begin with 2 red balls, 2 green balls, $H_0\colon \pi=\frac{1}{2}$ and 50 trials, hide the p-values graph, and just note the proportion of the experiments in which $H_0$ is rejected based on the final p-value. They could then repeat this but note the proportion of the experiments in which the p-value ever drops below 0.05. What does this suggest?

Students could then go on to change some of the parameters in a systematic fashion, exploring whether their initial ideas hold true more generally.

Key questions

Is it necessary to specify the number of trials in advance?
What would happen if we didn't?

Possible extension

Is there any way of stopping the experiment early and still obtaining useful results?
What is the benefit of doing more trials? Surely we would still only reject $H_0$ 5% of the time? You can use the simulation to explore this.

Possible support

There are several things which can be changed in the simulation, and it is easy to get lost. Students will benefit from being systematic, and guiding them to structure their exploration and recording of results will help them to understand what is happening.

Number and algebra

Geometry and measure

Probability and statistics

Working mathematically

Advanced mathematics

For younger learners

Robin's Hypothesis Testing

Why do this problem?

Possible approach

Key questions

Possible extension

Possible support

You may also like

Very Old Man

Reaction Timer Timer

Chi-squared Faker