Hi Codecademy team,

I have just finished the Hypothesis Testing module in the Data Science path. I am feeling overwhelmed by all the information that I have to absorb and understand, before completing the entire module, as some of you know that there is a project about Fetchmaker, a start-up company.

I am genuinely interested in receiving some constructive feedback from other aspiring data scientist in Codecademy on my Python coding skill in regards to this particular project.

Below is the snapshot of my code:

```
import numpy as np
import fetchmaker
# Number 7
from scipy.stats import binom_test
# Number 9
from scipy.stats import f_oneway
# Number 10
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Number 13
from scipy.stats import chi2_contingency
# Number 1
fetch_maker = fetchmaker.dogs
# print(fetch_maker)
# Number 2
rottweiler_tl = fetchmaker.get_tail_length('rottweiler')
# print(rottweiler_tl)
# Number 3
rottweiler_tl_mean = np.mean(rottweiler_tl)
rottweiler_tl_std = np.std(rottweiler_tl)
print('The rottweiler avg tail length is {} \n'. format(rottweiler_tl_mean))
print('The rottweiler std dev of tail length is {} \n'. format(rottweiler_tl_std))
# Number 4
whippet_rescue = fetchmaker.get_is_rescue('whippet')
# print(whippet_rescue)
# Number 5
# To count the number of entries that are not zero (1)
num_whippet_rescues = np.count_nonzero(whippet_rescue)
print('The count of (1) entry in the whippet_rescue is {} \n'.format(num_whippet_rescues))
# Number 6
# To get the number of samples using np.size
num_whippets = np.size(whippet_rescue)
print('The number of samples in the whippet_rescue is {} \n'.format(num_whippets))
# Number 7 and 8
expected_percentage_whippets_rescue = 0.08
binom_test_whippets_rescues = binom_test(num_whippet_rescues, num_whippets,expected_percentage_whippets_rescue)
print('The P-Value of the whippet_rescue is {} \n'.format(binom_test_whippets_rescues))
print('So the P-Value from the Whippet_Rescue Binomial Test is %.3f and therefore, we accept the null hypothesis, which is that there is no difference between the observed number of whippet rescues and our expected whippet rescues percentage'%(binom_test_whippets_rescues))
print('\n')
# Number 9
# since these datasets are numerical, we will be using ANOVA test to ensure the probability of False Positive stays 0.05
whippets_weight = fetchmaker.get_weight('whippet')
terriers_weight = fetchmaker.get_weight('terrier')
pitbulls_weight = fetchmaker.get_weight('pitbull')
ANOVA_mid_size_dogs = f_oneway(whippets_weight, terriers_weight, pitbulls_weight)
print('The P-value obtained from the ANOVA test on these three popular breeds is %.3f and therefore, we reject the null hypothesis, which is there is significant difference in the average weights of these three dogs, but we do not know which pair of datasets is significantly different.'% (ANOVA_mid_size_dogs[1]))
print('\n')
# Number 10
# To know which pair has a significant difference in their mean, we must use Tukey's Range test
data = np.concatenate([whippets_weight, terriers_weight, pitbulls_weight])
labels = ['whippet'] * len(whippets_weight) + ['terrier'] * len(terriers_weight) + ['pitbull'] * len(pitbulls_weight)
tukey_result = pairwise_tukeyhsd(data, labels, alpha = 0.05)
print("Below is the table generated from the Tukey's Range Test to find out which pair of datasets is statistically different: \n {}".format(tukey_result))
print('\n')
# Number 11
poodle_colors = fetchmaker.get_color('poodle')
shihtzu_colors = fetchmaker.get_color('shihtzu')
# print(poodle_colors)
# print(shihtzu_colors)
# Number 12
#First, obtain the color numbers for poodle breed
black_poodle = np.count_nonzero(poodle_colors == 'black')
brown_poodle = np.count_nonzero(poodle_colors == 'brown')
gold_poodle = np.count_nonzero(poodle_colors == 'gold')
grey_poodle = np.count_nonzero(poodle_colors == 'grey')
white_poodle = np.count_nonzero(poodle_colors == 'white')
#Secondly, obtain the color numbers for shihtzu breed
black_shihtzu = np.count_nonzero(shihtzu_colors == 'black')
brown_shihtzu = np.count_nonzero(shihtzu_colors == 'brown')
gold_shihtzu = np.count_nonzero(shihtzu_colors == 'gold')
grey_shihtzu = np.count_nonzero(shihtzu_colors == 'grey')
white_shihtzu = np.count_nonzero(shihtzu_colors == 'white')
#Next, create the contingency table using a list of lists
color_table = [[black_poodle, black_shihtzu],[brown_poodle, brown_shihtzu], [gold_poodle, gold_shihtzu], [grey_poodle, grey_shihtzu], [white_poodle, white_shihtzu]]
# Number 13
chi2, pval, dof, expected = chi2_contingency(color_table)
print('The statistic of the color_table dataset is %.3f \n'%(chi2))
print('The P-Value of the color_table dataset is %.3f \n'% (pval))
print('The degrees of freedom from the color_table dataset is {} \n'.format(dof))
print('The expected table is as follows: \n {}'.format(expected))
print('\n')
print('The conclusion from the Chi-Square test above is since the P-Value is %.3F, we reject the null hypothesis and stated that there is a significant difference between the datasets'% (pval))
```

Below are the outputs:

```
The rottweiler avg tail length is 4.2361
The rottweiler std dev of tail length is 2.06475368749
The count of (1) entry in the whippet_rescue is 6
The number of samples in the whippet_rescue is 100
The P-Value of the whippet_rescue is 0.581178010624
So the P-Value from the Whippet_Rescue Binomial Test is 0.581 and therefore, we accept the null hypothesis, which is that there is no difference between the observed number of whippet rescues and our expected whippet rescues percentage
The P-value obtained from the ANOVA test on these three popular breeds is 0.000 and therefore, we reject the null hypothesis, which is there is significant difference in the average weights of these three dogs, but we do not know which pair of datasets is significantly different.
Below is the table generated from the Tukey's Range Test to find out which pair of datasets is statistically different:
Multiple Comparison of Means - Tukey HSD,FWER=0.05
==============================================
group1 group2 meandiff lower upper reject
----------------------------------------------
pitbull terrier -13.24 -16.728 -9.752 True
pitbull whippet -3.34 -6.828 0.148 False
terrier whippet 9.9 6.412 13.388 True
----------------------------------------------
The statistic of the color_table dataset is 14.727
The P-Value of the color_table dataset is 0.005
The degrees of freedom from the color_table dataset is 4
The expected table is as follows:
[[ 13.5 13.5]
[ 24.5 24.5]
[ 7. 7. ]
[ 46.5 46.5]
[ 8.5 8.5]]
The conclusion from the Chi-Square test above is since the P-Value is 0.005, we reject the null hypothesis and stated that there is a significant difference between the datasets
```

I know this is a lot to ask, but this particular project consumed a lot of my time due to the fact that I am particularly new to statisticâs field and Python. Therefore, some feedback would be very appreciated so I can improve on my statistic skill and Python coding skill.

Oh I forgot to mention, in the Python Code section, there are comments such as (#number1, #number2 etc etc). These comments were used as a way for me to keep track on which code belongs to which task in the module.

Thanks heaps in advance,

Jimmy