Skip over navigation
Cambridge University Faculty of Mathematics NRich logo
menu search
  • Teachers expand_more
    • Early years
    • Primary
    • Secondary
    • Post-16
    • Events
    • Professional development
  • Students expand_more
    • Primary
    • Secondary
    • Post-16
  • Parents expand_more
    • Early Years
    • Primary
    • Secondary
    • Post-16
  • Problem-Solving Schools
  • About NRICH expand_more
    • About us
    • Impact stories
    • Support us
    • Our funders
    • Contact us
  • search

Or search by topic

Number and algebra

  • The Number System and Place Value
  • Calculations and Numerical Methods
  • Fractions, Decimals, Percentages, Ratio and Proportion
  • Properties of Numbers
  • Patterns, Sequences and Structure
  • Algebraic expressions, equations and formulae
  • Coordinates, Functions and Graphs

Geometry and measure

  • Angles, Polygons, and Geometrical Proof
  • 3D Geometry, Shape and Space
  • Measuring and calculating with units
  • Transformations and constructions
  • Pythagoras and Trigonometry
  • Vectors and Matrices

Probability and statistics

  • Handling, Processing and Representing Data
  • Probability

Working mathematically

  • Thinking mathematically
  • Developing positive attitudes
  • Cross-curricular contexts

Advanced mathematics

  • Decision Mathematics and Combinatorics
  • Advanced Probability and Statistics
  • Mechanics
  • Calculus

For younger learners

  • Early Years Foundation Stage

Powerful Hypothesis Testing

Age 16 to 18
Challenge Level Yellow starYellow star
  • Problem
  • Getting Started
  • Student Solutions
  • Teachers' Resources
Here are some comments on the questions in the problem (but not full solutions):

What is the probability of $H_0$ being rejected?
Do your answers change if the true proportion of greens in the bag changes?
What would happen if you changed the hypothesised proportion $\pi$?
What would happen if you changed the significance level of the test from 5% to 10% or 1%?


This depends on the proportion in $H_0$, the true proportion, the number of trials and the significance level.  We can get evidence from the simulation, or we can work theoretically.  In general, we would expect that the greater the difference between $\pi$ in $H_0$ and the true proportion, the greater the probability of $H_0$ being rejected (the null hypothesis is "more wrong"); the greater the number of trials, the greater the probability of rejection (the sample proportion will be more likely to be close to the true proportion), and as the significance level is raised, the probability of $H_0$ being rejected will also increase (as we are reducing the range of acceptance).

The probability of rejecting $H_0$ in this problem can be calculated as follows.  Let the hypothesised proportion be $\pi_0$ and the true proportion be $\pi_1$.  Let $X$ be the number of greens observed after $n$ trials.  Under the null hypothesis with significance level $\alpha$ (so typically $\alpha=0.05$), $X\sim \mathrm{B}(n,\pi_0)$, and the null hypothesis will be rejected if $X$ lies in the critical region, which is $Xx_2$, where $x_1$ is the largest integer for which $\mathrm{P}(Xx_2|H_0)\le \alpha/2$.  We can then calculate these probabilities given that $H_1$ is true, so that $X\sim \mathrm{B}(n,\pi_1)$ and deduce that the probability of $H_0$ being rejected is $\mathrm{P}(Xx_2|H_1)$.  These calculations can be easily performed by computer.

Note that it is only possible to perform this calculation if we know the actual proportion.  But if we know the actual proportion, why are we doing a hypothesis test?!  This makes the power of a test a somewhat difficult idea.  We could, though, be more specific, and say that we are testing $H_0\colon \pi=0.5$ against $H_1\colon \pi=0.6$, and ask which of these hypotheses is more likely to be true.  This is a different way of performing hypothesis testing, which is dealt with in the article [yet to be written].

If $H_0$ is rejected, how likely is it that the alternative hypothesis $H_1$ is true?

A tree diagram will help here: we have two possibilities, $H_0$ is true and $H_1$ is true.  And for each of these, either $H_0$ will be accepted or rejected.  So we have, looking at the tree diagram [which would be nice to draw]

$$\mathrm{P}(\text{$H_1$ true} | \text{$H_0$ rejected}) = \frac{\mathrm{P}(\text{$H_1$ true} \cap \text{$H_0$ rejected})}{\mathrm{P}(\text{$H_1$ true} \cap \text{$H_0$ rejected})+\mathrm{P}(\text{$H_0$ true} \cap \text{$H_0$ rejected})} = \frac{\mathrm{P}(\text{$H_0$ rejected} | \text{$H_1$ true})\mathrm{P}(\text{$H_1$ true})}{\mathrm{P}(\text{$H_0$ rejected} | \text{$H_1$ true})\mathrm{P}(\text{$H_1$ true})+\mathrm{P}(\text{$H_0$ rejected} | \text{$H_0$ true})\mathrm{P}(\text{$H_0$ true})}.$$

But we don't know the majority of probabilities in this calculation!  We only know that $\mathrm{P}(\text{$H_0$ rejected} | \text{$H_0$ true})$ is the significance of the test, which we have chosen.  So without some idea of how likely it is that $H_1$ is true, and some idea of the probability of rejecting $H_0$ if $H_1$ is true, we cannot say how likely it is that $H_1$ is true even if we reject $H_0$!  Likewise, we cannot say how likely it is that $H_0$ is true if we accept it.

If Robin wants to be 90% certain of rejecting the null hypothesis if it is wrong, how many trials are needed?

This again depends on the actual proportion of green balls.  If, though, Robin assumes what the actual proportion might be, we can then use the above calculations, trying different values of $n$ until we find one that is large enough so that $\mathrm{P}(Xx_2|H_1)>0.9$.

Remembering that each trial costs a certain amount, what is the best number of trials to perform?  (And what does "best" mean?)

This is a hard question!  It depends on what is most important to Robin.  It is a balance between getting the "correct" answer, avoiding the "wrong" answer, the cost of the trials, and the assumed alternative hypothesis actual proportion.

You may also like

Very Old Man

Is the age of this very old man statistically believable?

Reaction Timer Timer

How can you time the reaction timer?

Chi-squared Faker

How would you massage the data in this Chi-squared test to both accept and reject the hypothesis?

  • Tech help
  • Accessibility Statement
  • Sign up to our newsletter
  • Twitter X logo

The NRICH Project aims to enrich the mathematical experiences of all learners. To support this aim, members of the NRICH team work in a wide range of capacities, including providing professional development for teachers wishing to embed rich mathematical tasks into everyday classroom practice.

NRICH is part of the family of activities in the Millennium Mathematics Project.

University of Cambridge logo NRICH logo