There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
You can also find further discussion and get answers to your questions over in Language Help.
Agree with a comment or answer? Like () to up-vote the contribution!
In this lesson, perhaps we can also clarify where the false negative rates fits into all this? It’s not mentioned once.
In a manufacturing environment for instance, you might not care as much about false positives for defects because you are leaning heavily towards quality, but care very strongly about false negatives, because it is reducing your process output.
EDIT: More confusion:
Let’s say I’m concerned with the chance of arriving at the wrong conclusion.
The power of the test is the probability of detecting a difference if it exists.
The significance threshold of the test is the probability of incorrectly or falsely detecting a difference -whether a difference exists or not?
So your ultimate probability of arriving at a wrong conclusion = (significance_threshold * power) + significance_threshold + power ?
I got 100% score in the quiz after the lesson but I feel like I still don’t understand at all.
At least in the context of sample size calculation, the false negative rate is equal to 1 - power. Note that they are conditional probabilities.
In this lesson we consider the following null hypothesis H0 and alternative hypothesis H1:
H0: email open rate is control_rate.
H1: email open rate is name_rate.
The false positive rate (= significance threshold) is the probability that the null hypothesis H0 is rejected under the assumption that H0 is actually true.
The power is the probability that the null hypothesis H0 is rejected under the assumption that H1 is true.
The false negative rate is the probability that the null hypothesis H0 is not rejected under the assumption that H1 is true. So it is equal to 1 - power.
I think the probability of arriving at the wrong conclusion will be expressed more difficult. Let’s consider the following hypothesis H(m) for any m:
H(m): email open rate is m.
Let Pr(H(m)) denote the probability that H(m) is true. Let f(m) be the probability that we arrive at the wrong conclusion under the assumption that H(m) is true. That is, if |m - control_rate| < |name_rate - control_rate|, let f(m) be the probability that H0 is rejected under the assumption that H(m) is true, and if not, let f(m) be the probability that H0 is not rejected under the assuption that H(m) is true. Then I think that the probability you asked to be obtained by integrating Pr(H(m)) * f (m) for all m. But I’m not an expert, so this might be wrong.
I’m a little confused at the last part of this section:
It notes that when the MDE of 40% is larger than our actual effect, the power goes down.
But… where is the actual effect we are comparing it to?
I don’t think it’s the Baseline rate, as that is still 50%, and it can’t be the (1 + lift) * Baseline since that’s always going to be more than the lift.
Is it comparing the sample size calculator’s 40% MDE with the simulation code’s lift = 0.3 (since it doesn’t mention we should update that part of the code)?
I’m a little confused still and I’m unclear if I should be taking away something about using the calculator well, or something about making sure the calculator reflects reality (in this case, the simulation).
I didn’t notice it until you said, but now I also think the description of Step 2 is confusing. I think that it should be described by being divided into the following two points.
Detecting smaller effect sizes requires larger sample sizes. (In other words, if we only need to find a larger effect size, we can reduce the sample size.)
Decreasing sample size decreases the power of a test.
I think, a confusing thing about the description of Step 2 is that, even though they said we are going to examine how MDE impacts the power, in fact we are changing the sample size in the code. If the value of lift in the code is different from MDE, the outputted value by the code will no longer be called the power, since the test hypothesis has changed.