Any feed back would be great, i realize its quite verbose in terms of the amount of code i used, and will one day proof read any spelling errors etc. Am also looking into breaking it down in Class’s so as to lessen the amount of code required. Although im quite happy with the results, which is more about the process of solidifying some of the skills i have learned from codecacademy so far.
A thought…Rather than write out the code to get the stats for each region (mean, std, median)…you could first break out the regions using .iloc[()].values like so:
imo you don’t need to create classes. Pandas is quite powerful in its own right (in addition to using scipy.stats & math libraries). If you can write a function you can run two-tailed t-test for statistical significance and figure out the strength of the relationship between the variables using Cohen’s d too.