FAQ: Sample Size Determination with Simulation - Review

This community-built FAQ covers the “Review” exercise from the lesson “Sample Size Determination with Simulation”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Master Statistics with Python

FAQs on the exercise Review

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!
You can also find further discussion and get answers to your questions over in #get-help.

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head to #get-help and #community:tips-and-resources. If you are wanting feedback or inspiration for a project, check out #project.

Looking for motivation to keep learning? Join our wider discussions in #community

Learn more about how to use this guide.

Found a bug? Report it online, or post in #community:Codecademy-Bug-Reporting

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In this lesson, perhaps we can also clarify where the false negative rates fits into all this? It’s not mentioned once.
In a manufacturing environment for instance, you might not care as much about false positives for defects because you are leaning heavily towards quality, but care very strongly about false negatives, because it is reducing your process output.

EDIT: More confusion:
Let’s say I’m concerned with the chance of arriving at the wrong conclusion.
The power of the test is the probability of detecting a difference if it exists.
The significance threshold of the test is the probability of incorrectly or falsely detecting a difference -whether a difference exists or not?
So your ultimate probability of arriving at a wrong conclusion = (significance_threshold * power) + significance_threshold + power ?

I got 100% score in the quiz after the lesson but I feel like I still don’t understand at all.

2 Likes

At least in the context of sample size calculation, the false negative rate is equal to 1 - power. Note that they are conditional probabilities.

In this lesson we consider the following null hypothesis H0 and alternative hypothesis H1:

H0: email open rate is control_rate.
H1: email open rate is name_rate.

  • The false positive rate (= significance threshold) is the probability that the null hypothesis H0 is rejected under the assumption that H0 is actually true.
  • The power is the probability that the null hypothesis H0 is rejected under the assumption that H1 is true.
  • The false negative rate is the probability that the null hypothesis H0 is not rejected under the assumption that H1 is true. So it is equal to 1 - power.

I think the probability of arriving at the wrong conclusion will be expressed more difficult. Let’s consider the following hypothesis H(m) for any m:

H(m): email open rate is m.

Let Pr(H(m)) denote the probability that H(m) is true. Let f(m) be the probability that we arrive at the wrong conclusion under the assumption that H(m) is true. That is, if |m - control_rate| < |name_rate - control_rate|, let f(m) be the probability that H0 is rejected under the assumption that H(m) is true, and if not, let f(m) be the probability that H0 is not rejected under the assuption that H(m) is true. Then I think that the probability you asked to be obtained by integrating Pr(H(m)) * f (m) for all m. But I’m not an expert, so this might be wrong.