Hi everyone,
I’m excited to share my final project for the Codecademy Data Scientist career path. I decided to look at the effects of Universal Pre-K, a public policy introduced in New York City in 2014, on the number of public pre-K sites offered across the city. I looked at data from the 2013-2014 school year and compared it with the 2018-2019 school year (the last full school year before the beginning of the COVID-19 pandemic).
This project took me about 15 hours to finish.
I’ve included a link below. Please let me know if you have any thoughts or questions!
https://github.com/eric-mosher/Codecademy-Data-Science-Final-Project/blob/main/New%20York%20City%20Public%20Pre-K%20Data%20Survey.ipynb
Best regards,
Eric
1 Like
Your repo is set to “private”.
Thanks, it should be publicly viewable now.
Eric
Cool project! Congrats on finishing it.
Good use of the text markdown cells to explain each of your steps so someone with no knowledge of the data could follow along.
A few things:
-
Where is the data from? US Census/ACS? Open Data NYC? It’s good practice to list the source of the data (and a link) at the top of the notebook.
-
One small thing that might be helpful is if you just printed out the first 5 rows of each df so the viewer could see how the data looks. ex: school2013.head()
.
-
Also, in cell 67, which bars are 2013? blue? (and obv. the orange bars are 2018.)
Adding a legend might help there…for clarity’s sake.
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend
Or, you could use the “hue” parameter in Seaborn to add a legend. ie:
ax = sns.barplot(x='Borough', y='no. of programs',
hue= "year", data = pd.concat([school2013, school2018])) #or something like that.
https://seaborn.pydata.org/generated/seaborn.barplot.html
Good work!