FAQ: Statistical Concepts - Type I Or Type II


#1

This community-built FAQ covers the “Type I Or Type II” exercise from the lesson “Statistical Concepts”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise Type I Or Type II

In our examples of Type I and Type II errors, we are only able to determine false positives and false negatives because we know the true positives and true negatives. How should we think about these errors when the true values aren’t known?

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!


#2

What do the terms “false-negative” , “false-positive”, and “null hypothesis” even mean, and what is the point of the intersect function? How do these things relate to the lists provided for the lesson?


#3

I’m going to borrow a bit from a medical example and hopefully it will help.

Let’s assume two people come in for cancer screening. Ted does not have cancer and Bill does.

Type I error would be telling Ted that he DOES have cancer when he does NOT; this is also called a false positive; he doesn’t have a disease you told him he does. Type II error would be missing Bill’s cancer; this is called false negative because we tell him he DOES NOT have colon cancer when he DOES.

The null hypothesis is more difficult. Generally, we assume that there are a certain number of Bills and Teds in the world and that a percentage of each of them will develop cancer. We don’t have time sample all of them so let’s say we have a thousand Bills and a thousand Teds to represent both groups. The null hypothesis would (as is always the case by convention) be that Bills and Teds have the exact same risk of developing cancer and that there is no difference between a Bill or a Ted. The tests we are learning about help us either accept that this is true or reject it (“accepting/rejecting the null”).

The intersect function cross references one set of data to evaluate whether any of the values in that data set exist within the data set your intersecting with (A contains 1, 2 and B contains 2, 3. Intersect evaluates that 2 is in both sets). It is a function that the people at codecademy created for you and the output you can generate via (SPOILER):

type_i_error = intersect([measured value], [actual value])

will allow you to see which of those individuals who were positive in the experiment (we told them they have cancer) were actually negative (they do not have cancer) in “real life” and those are individuals who represent false positives (which is type I error). The reverse is true for false negatives.

Hope this helps.


#4

This helped tonnes! Many thanks for spending time on helping us