FAQ: Sampling Distributions - Sampling Distributions

discourse-admin · April 27, 2021, 2:35pm

This community-built FAQ covers the “Sampling Distributions” exercise from the lesson “Sampling Distributions”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Master Statistics with Python

Probability

FAQs on the exercise Sampling Distributions

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply () below!
You can also find further discussion and get answers to your questions over in Language Help.

Agree with a comment or answer? Like () to up-vote the contribution!

Need broader help or resources? Head to Language Help and Tips and Resources. If you are wanting feedback or inspiration for a project, check out Projects.

Looking for motivation to keep learning? Join our wider discussions in Community

Learn more about how to use this guide.

Found a bug? Report it online, or post in Bug Reporting

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

christianzaner · March 14, 2023, 5:17pm

for this example of code, why is the range 500? shouldn’t the range be 50 because that’s the sample size?

mtrtmk · March 14, 2023, 6:26pm

sample_size = 50
# ... more code here ...
samp = np.random.choice(salmon_population, sample_size, replace = False)

The above statement picks 50 salmons randomly (without replacement i.e. the same fish won’t be selected more than once in the current sample) from the whole salmon population. This sample of 50 salmons is then assigned to the variable samp and (later further down in the code for the example), the mean of this sample is calculated. This is the first mean of a random sample of size 50.

Then, we repeat the sampling. 50 salmons are randomly picked from the whole salmon population. The mean of this sample is calculated. This is the second mean of a random sample of size 50.

The same process is repeated again and again.

The loop of range(500) is meant to carry out 500 repetitions of the above process. Nothing special about 500. We could repeat the process 600 times or 1000 times or some other number. Instead of just repeating the above process 10 or 20 times, doing so 500 times (or some other sufficiently large number) should give a reasonably decent picture of the distribution of the means. Do the means vary a lot or are the means fairly close together with a few outliers? The more times we select a random sample (of 50) and calculate the mean, the more data we have to arrive at an answer.