Biodiversity National Park

Hi,
Here I share my solution for this project.

I use the old Capstone project as a guide.
I would like to read your comments.
Good day!
Gustavo.

Some thoughts:
(the first data set, species.csv)

  • the chi sq. contingency table results between the protection status of birds and mammals isn’t a correct reading of the p-value. The value is 0.68 which isn’t a significant (ie: < 0.05) difference so, one is not more likely to be endangered.

  • I think there’s another chi sq contingency test that is missing–is there any difference between Reptile and Mammal/is one more likely to be endangered than the other?

  • I think that I was confused by the bar plot of all the observations from the second data set (b/c I couldn’t see where the analysis was going). I think it would be a bit clearer to the reader if those cells were removed and then you started to separate the species observations that are actually sheep.
    So, the first thing to do (as you’ve calculated) is to create a new column called “is sheep” and then a lambda function to determine if the word “sheep” is in the common_names col. From the instructions:
    " Use ‘apply’ and a ‘lambda’ function to create a new column in ‘species’ called ‘is_sheep’ which is ‘True’ if the ‘common_names’ contains ‘Sheep’ , and ‘False’ otherwise."

And then you select the rows where ‘is_sheep’ == True. and then,

sheep_species = species[(species.is_sheep) & (species.category =='Mammal')]
sheep_species

Then the rest of the code is good…However, I think the last part of the project is missing in the notebook—how many weeks of observation do scientists need in each park to see if their program to combat foot & mouth disease is working?
Unless that part of the project isn’t part of the project any longer? (I did the project awhile ago so maybe they changed it up?)