Hello, this is my version of the study on the biodiversity of the national parks. I focused on the overall conservation status of the species, without analyzing data park by park, but instead focusing on the different categories of species.
This project is the one I have spent the most time since the beginning of the Data Science Path, it took me 2 days (10 - 15h maybe), a lot of this time spent at trying to come up with a raw predictive model for the conservation status of a species based on the number of observations made.
In order to do so, I have transformed the categorical variable conservation status to an ordinal variable called concern index.
I’m following the Data Science Path in the sequence proposed, so I have not studied machine learning yet. Yet, I came up with a ‘predictive model’, I’m not sure if this is the done in the ‘right way’. I’m sure I forgot to make a lot of considerations, and I did not use any machine learning techniques (which I’ve heard of but not learned).
To the moderators: could you please give me some critical feedback about the path I chose for building the ‘predictive model’, and how would you do this in a professional way (just a broad view)?
Thanks in advance for any feedback!
https://github.com/rafabrisighello/NationalParks_CodeCademy/blob/main/biodiversity.ipynb