Data Science Life Expectancy Project Feedback

Greetings,

I recently completed the Life Expectancy project. My blog post can be found here: CodeCademy Life Expectancy Project - Lucas Spangler - Medium

My code can be found on github here:

Let me know what you think!

The GitHub link isn’t to your repository; it links to the CC GitHub repos.

Do you have a link to your GitHub?

Looked at the medium blog post.
My feedback:

  • Plot titles of the leftmost column say GDP vs year, but the plot axes are actually GDP and Life Expectancy
  • Would be a good idea to put a regression line through the scatter plots (so we have visual reference for good linearity) and mention the pearson correlation score
  • GDP across countries violin plots don’t render properly for me. There are no violins. See:
    https://miro.medium.com/max/750/1*sPKw2IDOBSr6Xk2W8biN_g.png
  • I would recommend changing the seaborn context (notebook, poster, etc) because currently the font sizes are too small and spaced out and don’t seem very blog-appropriate.
  • Asides, from having your overall conclusion at the bottom, it would be nice to see an explanation of the different plots and the different kinds of key information each highlights.

Congratulations on making it this far and don’t stop until you finish the course.

Here is the right link for the project repository: https://github.com/spang047/Life_expectancy.git

I get a 404 error. I think it’s set to private rather than public.

I went ahead and updated the code to public. You should be able to see it now.

I updated the write up with more information, trend lines, and Pearson values. I still can’t get the GDP violin plot to graph in a way that can be easily viewed because of the large differences in GDP. I thought of scaling all of the values to make it more relative, but even with scaling - it didn’t render well. I gave up and left it looking more like a box and whisker plot.

Congrats on finishing the project! Seems like you understand how to write functions and use Seaborn to plot data.

Some thoughts:

  • Tell a story with the data. We—data scientists and data analysts—are storytellers. Specifically, show the reader of your notebook what your thought processes are while you explore the data (use comments or markdown/text boxes) so they get the gist of what you’re doing. (It’s like writing–there’s an intro, a body of the story and a conclusion.) At the top of the notebook state the data source and the initial questions that you want to look at in the data and then discuss your findings as you visualize & plot the data. Your question: is there a correlation between LEABY & GDP? Yes…

  • It also might be useful to do some initial distribution plots to see what the data looks like overall and then go from there–use some bar plots to see the avg. LEABY for each country. (Same for GDP). Those will explain the skews in the data distributions (gdp, right-skewed [most of the values are on the left] and leaby(left-skewed [most of the values are on the right]).

  • You could also use pandas to explore the data initially as well. ex: all_data.describe() will give you basic stats–mean, std, etc.

If you just do a basic google search about data & storytelling you’ll find a lot out there. I also think there’s some articles on the DS path too.
Here’s an example.

Happy coding!