It is helpful to be systematic about your experiments and record your observations. For example, you might want to run 100 simulations with the default settings. How often is $H_0$ rejected when looking at just the final p-value? What about when we use Robin's approach?
Next, try changing the number of trials. How do the two approaches compare now?
Now try changing the proportion of green balls in the bag. What happens now? (Start with 50 trials again, so that you have a fair comparison with the original case.)