Or search by topic
Published 2018 Revised 2019
Using this tree diagram, we can work out the probabilities of $H_0$ being true or $H_1$ being true given our experimental results. To avoid the expressions becoming unwieldy, we will write $H_0$ for "$\text{$H_0$ true}$", $H_1$ for "$\text{$H_1$ true}$" and "$\text{p}^+$" for "observed p-value or more extreme". Then we can write (conditional) probabilities on the branches of the tree diagram leading to our observed p-value: [note 2]
The two routes which give our observed p-value (or more extreme) have the following probabilities:
$$\begin{align*}
\mathrm{P}(H_0\cap \text{p}^+) &=
\mathrm{P}(H_0) \times \mathrm{P}(\text{p}^+ | H_0) \\
\mathrm{P}(H_1\cap \text{p}^+) &=
\mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1)
\end{align*}$$
(Recall that $\mathrm{P}(H_0\cap \text{p}^+)$ means "the probability of $H_0$ being true and the p-value being that observed or more extreme".)
We can therefore work out the probability of the alternative hypothesis being true given the observed p-value, using conditional probability:
$$\begin{align*}
\mathrm{P}(H_1|\text{p}^+) &=
\frac{\mathrm{P}(H_1\cap \text{p}^+)}{\mathrm{P}(\text{p}^+)} \\
&= \frac{\mathrm{P}(H_1\cap \text{p}^+)}{\mathrm{P}(H_0\cap\text{p}^+)+\mathrm{P}(H_1\cap\text{p}^+)} \\
&= \frac{\mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1)}{\mathrm{P}(H_0) \times \mathrm{P}(\text{p}^+ | H_0) + \mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1)}
\end{align*}$$
Though this is a mouthful, it is a calculation which only involves the four probabilities on the above tree diagram. (This is an example of Bayes' Theorem, discussed further in this resource.)
However, we immediately hit a big difficulty if we try to calculate this for a given experiment. We know $\mathrm{P}(\text{p}^+ | H_0)$: this is just the p-value itself. (The p-value tells us the probability of obtaining a result at least this extreme given that the null hypothesis is true.) But we don't know the probability of the null hypothesis being true or false (that is,
$\mathrm{P}(H_0)$ and $\mathrm{P}(H_1)=1-\mathrm{P}(H_0)$), nor do we know the probability of the observed result if the alternative hypothesis is true ($P(\text{p}^+|H_1)$), as knowing that the proportion of greens is not $\frac{1}{2}$ does not tell us what it actually is. (Similar issues apply to all the other contexts of hypothesis testing listed above.) So we are quite stuck: in
the null hypothesis significance testing model, it is impossible to give a numerical answer to our key question: "Given our results, what is the probability that the alternative hypothesis is true?" This is because we don't know two of the three probabilities that we need in order to answer the question.
An example might highlight the issue a little better. Let us suppose that we are trying to work out whether a coin is biased (alternative hypothesis), or whether the probability of heads is exactly $\frac{1}{2}$ (null hypothesis). We toss the coin 50 times and obtain a p-value of 0.02. Do we now believe that the coin is biased? Most people believe that coins are not
biased, and so are much more likely to attribute this result to chance or poor coin-tossing technique than to the coin being biased.
On the other hand, consider a case of a road planner who introduces a traffic-calming feature to reduce the number of fatalities along a certain stretch of road. The null hypothesis is that there is no change in fatality rate, while the alternative hypothesis is that the fatality rate has decreased. A hypothesis test is performed on data collected for 24 months before and 24 months
after the feature is built. Again, the p-value was 0.02. Do we believe that the alternative hypothesis is true? In this case, we are more likely to believe that the alternative hypothesis is true, because it makes a lot of sense that this feature will reduce the number of fatalities.
Our "instinctive" responses to these results are tied up with assigning values to the unknown probabilities in the formula above. For the coin, we would probably take $\mathrm{P}(H_0)$ to be close to 1, say $0.99$, as we think it is very unlikely that the coin is biased, and $\mathrm{P}(\text{p}^+|H_1)$ will be, say, $0.1$: if the coin is biased, the bias is not likely to be very large, and
so it is only a bit more likely that the result will be significant in this case. Putting these figures into the formula above gives:
$$\mathrm{P}(H_1|\text{p}^+) = \frac{0.01 \times 0.1}{0.99 \times 0.02 + 0.01 \times 0.1} \approx 0.05,$$
that is, we are still very doubtful that this coin is biased, even after performing the experiment. Note that in this case, the probability of these results given that the null hypothesis is true is 0.02, whereas the probability that the null hypothesis is true given these results is $1-0.05=0.95$, which is very different. This shows how dramatically different the answers to the two
questions can be.
On the other hand, for the fatalities situation, we might assume quite the opposite: we are pretty confident that the traffic-calming feature will help, so we might take $\mathrm{P}(H_0)$ to be $0.4$, and $\mathrm{P}(\text{p}^+|H_1)$ will be, say, $0.25$ (though the traffic-calming may help, the impact may be relatively small). Putting these figures into the formula gives:
$$\mathrm{P}(H_1|\text{p}^+) = \frac{0.6 \times 0.25}{0.4 \times 0.02 + 0.6 \times 0.25} \approx 0.95,$$
so we are now much more convinced that the traffic-calming feature is helping than we were before we had the data. This time, the probability of these results given that the null hypothesis is true is still 0.02, whereas the probability that the null hypothesis is true given these results is $1-0.95=0.05$, which is not that different.
This approach may seem very disturbing, as we have to make assumptions about what we believe before we do the hypothesis test. But as we have seen, we cannot answer our key question without making such assumptions.
How many trials should we do in order to accept or reject our null hypothesis?
How effective are hypothesis tests at showing that our null hypothesis is wrong?
This pilot collection of resources is designed to introduce key statistical ideas and help students to deepen their understanding.