Skip over navigation
Cambridge University Faculty of Mathematics NRich logo
menu search
  • Teachers expand_more
    • Early years
    • Primary
    • Secondary
    • Post-16
    • Events
    • Professional development
  • Students expand_more
    • Primary
    • Secondary
    • Post-16
  • Parents expand_more
    • Early Years
    • Primary
    • Secondary
    • Post-16
  • Problem-Solving Schools
  • About NRICH expand_more
    • About us
    • Impact stories
    • Support us
    • Our funders
    • Contact us
  • search

Or search by topic

Number and algebra

  • The Number System and Place Value
  • Calculations and Numerical Methods
  • Fractions, Decimals, Percentages, Ratio and Proportion
  • Properties of Numbers
  • Patterns, Sequences and Structure
  • Algebraic expressions, equations and formulae
  • Coordinates, Functions and Graphs

Geometry and measure

  • Angles, Polygons, and Geometrical Proof
  • 3D Geometry, Shape and Space
  • Measuring and calculating with units
  • Transformations and constructions
  • Pythagoras and Trigonometry
  • Vectors and Matrices

Probability and statistics

  • Handling, Processing and Representing Data
  • Probability

Working mathematically

  • Thinking mathematically
  • Developing positive attitudes
  • Cross-curricular contexts

Advanced mathematics

  • Decision Mathematics and Combinatorics
  • Advanced Probability and Statistics
  • Mechanics
  • Calculus

For younger learners

  • Early Years Foundation Stage

Robin's Hypothesis Testing

Age 16 to 18
Challenge Level Yellow starYellow star
  • Problem
  • Getting Started
  • Student Solutions
  • Teachers' Resources
Robin has a bag containing red and green balls.  Robin wants to test the following hypotheses, where $\pi$ is the proportion of green balls in the bag:

$H_0\colon \pi=\frac{1}{2}$  and  $H_1\colon \pi\ne\frac{1}{2}$

Robin is allowed to take out a ball at random, note its colour and then replace it: this is called a trial.  Robin can do lots of trials, but each trial has a certain cost.

Robin wants to test these hypotheses as cheaply as possible, so suggests the following approach:

"I will do at most 50 trials.  If the p-value* drops below 0.05 at any point, then I will stop and reject the null hypothesis at the 5% significance level, otherwise I will accept it."

Robin tells you about this plan.  What advice could you give to Robin?
Warning - the computer needs a little bit of thinking time to do the simulations!



In this simulation, you can:
  • specify the number of green and red balls actually in the bag (and the true ratio is shown with a green dashed line on the graph) - note that in a real experiment we would not know this!
  • specify the number of trials (up to 200)
  • specify the proportion for the null hypothesis (which we took to be $\frac{1}{2}$ above)
  • choose whether to show the proportion of green balls after each ball is picked
  • choose whether to show the p-value after each ball is picked*
  • rerun the simulation ("Repeat experiment")
The "Final p-value" shows the p-value at the end of the experiment, and the orange lines are at 0.1, 0.05 and 0.01.


Here are some questions you could consider as you think about Robin's approach:
  • What do you notice about the patterns of proportions and p-values?  Is there anything which is the same every time or most times you run the simulation?
  • If we repeat the experiment lots of times, how often does $H_0$ get rejected using Robin's approach?  Does the answer to this depend on how many trials we perform?
  • Does the answer change if you change the true proportion of greens in the bag?
  • What would happen if you changed the hypothesised proportion $\pi$?
  • What would happen if you changed the significance level from 5% to 10% or 1%?
You may want to ask and explore other questions as well.

Rejecting $H_0$ when it is true is called a Type I error.

* To read more about p-values, have a look at What is a Hypothesis Test?  The p-values here are calculated like this: after $k$ trials, we find twice the probability of obtaining this number of greens or a more extreme number in $k$ trials, assuming that $H_0$ is true.  The graph shows how this p-value changes with $k$. 

This resource was inspired by the controversy surrounding a paper published in Nature Communications, as discussed by Casper Albers here.


This resource is part of the collection Statistics - Maths of Real Life

You may also like

Very Old Man

Is the age of this very old man statistically believable?

Reaction Timer Timer

How can you time the reaction timer?

Chi-squared Faker

How would you massage the data in this Chi-squared test to both accept and reject the hypothesis?

  • Tech help
  • Accessibility Statement
  • Sign up to our newsletter
  • Twitter X logo

The NRICH Project aims to enrich the mathematical experiences of all learners. To support this aim, members of the NRICH team work in a wide range of capacities, including providing professional development for teachers wishing to embed rich mathematical tasks into everyday classroom practice.

NRICH is part of the family of activities in the Millennium Mathematics Project.

University of Cambridge logo NRICH logo