Sampling Distributions Dance Party!

I’m stuck on the Bonus question on this exercise:

  • Use the sampling distribution of the sample minimum to estimate the probability of observing a specific sample minimum. For example, from the plot, what is the chance of getting a sample minimum that is less than 130bpm?

Anyone got any tips?

Please include a link to the lesson.

oops sorry, how do I do that?

No one has any idea what you’re referring to–language, course, etc. So, copy the link at the top of the page/navigation bar where the lesson is…

https://www.codecademy.com/journeys/data-scientist-aly/paths/dsalycj-22-data-science-foundations-ii/tracks/dsalycj-22-statistics-fundamentals-for-data-science/modules/dsf-sampling-for-data-science-bb36dd7f-b41c-43fe-80a6-3e92a7c441da/projects/sampling-distributions-project

understood, added the link as a reply to the thread for easier visibility:

https://www.codecademy.com/journeys/data-scientist-aly/paths/dsalycj-22-data-science-foundations-ii/tracks/dsalycj-22-statistics-fundamentals-for-data-science/modules/dsf-sampling-for-data-science-bb36dd7f-b41c-43fe-80a6-3e92a7c441da/projects/sampling-distributions-project

What code have you written so far? You can paste it so it’s formatted by using the “</>” button above.

I was able to complete the project without any issues, its just the Bonus question that I dont understand as it was not covered in the lesson. The Bonus question asks to estimate the probability of observing a specific sample minimum as opposed to the mean which is what the rest of the lesson and exercise covers. I understand how to estimate the probability of observing a specific sample mean using the “stats.norm.cdf” method from the SciPy library, however I don’t think we can use the same method for the minimum. So I’m not sure if theres another method to use here or maybe its the same method with different parameters? Here is my code for the full project below but again, I have nothing for the Bonus question:

from helper_functions import choose_statistic, population_distribution, sampling_distribution
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
import codecademylib3

# task 1: load in the spotify dataset
spotify_data = pd.read_csv("spotify_data.csv")

# task 2: preview the dataset
#print(spotify_data.head())
#print(spotify_data.info())

# task 3: select the relevant column
song_tempos = spotify_data.tempo


# task 5: plot the population distribution with the mean labeled

population_distribution(song_tempos)

# task 6: sampling distribution of the sample mean
sampling_distribution(song_tempos,30,"Mean")

# task 8: sampling distribution of the sample minimum
sampling_distribution(song_tempos,30,"Minimum")

# task 10: sampling distribution of the sample variance
sampling_distribution(song_tempos,30,"Variance")


# task 13: calculate the population mean and standard deviation
population_mean = song_tempos.mean()
population_std = np.std(song_tempos)

# task 14: calculate the standard error
standard_error = population_std/(30**.5)

# task 15: calculate the probability of observing an average tempo of 140bpm or lower from a sample of 30 songs
prob = stats.norm.cdf(140, population_mean, standard_error)
print(prob)

# task 16: calculate the probability of observing an average tempo of 150bpm or higher from a sample of 30 songs
prob2 = 1 - stats.norm.cdf(150, population_mean, standard_error)
print(prob2)

# EXTRA