Skip over navigation
Cambridge University Faculty of Mathematics NRich logo
menu search
  • Teachers expand_more
    • Early years
    • Primary
    • Secondary
    • Post-16
    • Events
    • Professional development
  • Students expand_more
    • Primary
    • Secondary
    • Post-16
  • Parents expand_more
    • Early Years
    • Primary
    • Secondary
    • Post-16
  • Problem-Solving Schools
  • About NRICH expand_more
    • About us
    • Impact stories
    • Support us
    • Our funders
    • Contact us
  • search

Or search by topic

Number and algebra

  • The Number System and Place Value
  • Calculations and Numerical Methods
  • Fractions, Decimals, Percentages, Ratio and Proportion
  • Properties of Numbers
  • Patterns, Sequences and Structure
  • Algebraic expressions, equations and formulae
  • Coordinates, Functions and Graphs

Geometry and measure

  • Angles, Polygons, and Geometrical Proof
  • 3D Geometry, Shape and Space
  • Measuring and calculating with units
  • Transformations and constructions
  • Pythagoras and Trigonometry
  • Vectors and Matrices

Probability and statistics

  • Handling, Processing and Representing Data
  • Probability

Working mathematically

  • Thinking mathematically
  • Developing positive attitudes
  • Cross-curricular contexts

Advanced mathematics

  • Decision Mathematics and Combinatorics
  • Advanced Probability and Statistics
  • Mechanics
  • Calculus

For younger learners

  • Early Years Foundation Stage

Robin's Hypothesis Testing

Age 16 to 18
Challenge Level Yellow starYellow star
  • Problem
  • Getting Started
  • Student Solutions
  • Teachers' Resources

Does Robin's approach work?


If we run the simulation with 50 trials, 2 red balls and 2 green balls, with $H_0\colon\pi=\frac{1}{2}$, we discover that about 5% of the time, the final p-value is less than 0.05.  It might take a lot of experiments to get an accurate percentage: I did 100 experiments, and 3 times the final p-value was less than 0.05.  (This is what the significance means: it is the probability that the null hypothesis will be rejected given that it is true.)

I then did another 100 experiments, counting the number of times the p-value went below 0.05: it was a total of 18 times out of 100.  This suggests that the probability of the p-value going below 0.05 is much higher than 0.05, and so Robin's approach is likely to reject the null hypothesis even when there is insufficient evidence to do so.

In fact, there is a theorem which says that if the null hypothesis is true and we keep doing trials for ever, the probability that the p-value will go below 0.05 at some point is 1.  (This is clearly true if the null hypothesis is false, as the proportion of green balls will tend to the true proportion and so the p-value will tend to 0.  The amazing result is that this statement is true even if the null hypothesis is true.)  So if we allow ourselves to do lots of trials, Robin's approach gradually becomes even worse.  For example, when I experimented with 200 trials, the p-value went below 0.05 on 23 occasions out of 100.  This seems a little worse than with 50 trials, but not by that much.  It turns out that one needs to do a huge number trials to reach, say, a probability of 0.5 of obtaining a p-value less than 0.05 at some point.

Fixing Robin's approach


There is a way that we could sometimes stop early and thereby save money.  Let's say that we decide that we're going to do 50 trials.  If we reach the 45th trial, say, and see that it is impossible for the p-value to drop below 0.05 by the 50th trial, we can stop and accept the null hypothesis.  This would take a little calculation, but could save Robin some money without invalidating the conclusion.

There are also more sophisticated ways of analysing a sequence of trials such as these, which can allow one to reject the null hypothesis earlier if it is wrong.  One needs to take account of the above problems, and adjust the calculations of p-values as one goes to ensure that the probability of incorrectly rejecting $H_0$ is still only 5%.  This technique is known as sequential analysis, and is very important in modern statistics.

Changing the conditions


If we change the true proportion of green balls and the hypothesised proportion $\pi$ to match it, then we still see similar behaviour to that observed earlier.

If, though, we change the true proportion to be something other than $\pi$, say we have 3 green balls and 2 red balls, with $H_0\colon \pi=\frac{1}{2}$ still, we observe that $H_0$ is rejected much more frequently.  In my experiments, $H_0$ was rejected 20 times out of 100 in this case.  This is good, as in this case we know that $H_0$ is not correct.

The more extreme the difference between the hypothesised $\pi$ and the true proportion, the more frequently $H_0$ is rejected.

You may also like

Very Old Man

Is the age of this very old man statistically believable?

Reaction Timer Timer

How can you time the reaction timer?

Chi-squared Faker

How would you massage the data in this Chi-squared test to both accept and reject the hypothesis?

  • Tech help
  • Accessibility Statement
  • Sign up to our newsletter
  • Twitter X logo

The NRICH Project aims to enrich the mathematical experiences of all learners. To support this aim, members of the NRICH team work in a wide range of capacities, including providing professional development for teachers wishing to embed rich mathematical tasks into everyday classroom practice.

NRICH is part of the family of activities in the Millennium Mathematics Project.

University of Cambridge logo NRICH logo