GDP & Life Expectancy:

Finally made it through this project!
I enjoyed working on it, especially because I was focused on the aspects I wanted to investigate, rather than concentrating on code.

In my case the objective to post a story on Medium was very compelling: I already knew the platform but the idea of being on the writer side never touched me.

So I spent some time for improving the overall appearance of the Medium story: working on something you want to publish is different from lessons or challenges where you practice on code effectiveness or visualizations readability. It gives an additional layer of complexity to the “creative process”.

I spent approximately 8 hours for coding and 8 to 10 hours for preparing the medium story and git hub repo for sharing.

Sooo… here it is:

Link to repo at the end of the story

I hope you enjoy it!

P.S.: to anyone interested, I found this Jupiter-to-Medium package really useful. Don’t worry about the integration token, now you can create one on your own, directly from your user settings (no mail required).

Thank you for posting your project! :partying_face:
I think it’s fantastic that you created a Medium post out of it. AND, thank you for posting that Jupyter-Medium link as it will be helpful for others (maybe more people will now create blog posts of their work!)

Some thoughts:

  • When you did your two tailed t-test for GDP & LEABY what made you only select data for the year 2015?

  • In a facet grid plot of all countries, didn’t Zimbabwe have the greatest change (positive correlation) in life-expectancy? (I saw that the US, Mexico and Zimbabwe had a linear relationship between those two variables, no?.)

  • One should avoid using archaic phrases like, ‘third world country’ (here: ‘The following violin plot gives an idea of the divide between third world countries like Zimbabwe and the rest of the world in terms of life expectancy’). That term has negative connotations and implies a ‘less than’/unsophisticated status and assumes a hierarchy between countries. See: here. and, here and, here.

One last thing. I think one needs to be aware that when saying things like, ‘Also cultural differences, eating habits and a less sedentary lifestyle play a role too. Chile probably records a better life expectancy than USA thanks to healthier habits in term of diet and fitness, rather than an improved infrastructure’ it can be interpreted as an assumption.
By assumption, I mean, that when a scientist/researcher makes claims like that, one needs to back it up with empirical evidence because otherwise it can be interpreted as just an opinion (we’re supposed to be objective, not subjective. It might be useful to read up on some works by cultural anthropologists and sociologists and qualitative analysis in general). I think that one has to be careful here when talking about cultural differences too because some may argue that culture–diet, eating habits, exercise, access to information [about foods, health], etc. could be affected by economics.

Hi lisalisaj thank you for your thoughts, I think these are all good points.
I will try explain the reasons behind every choice/mistake.

When you did your two tailed t-test for GDP & LEABY what made you only select data for the year 2015?

For the t-test I started by using entire time series before realizing that this is what those data points are: time series.
By using entire national trends for the test we are not looking at several sample of the same “population”, but to sequential snapshots describing how each country evolved in time.
16 years are enough for laws to take effect and changing, at least partially, some social systems of a nation.
My idea was to compare the two groups (high/low GDP) by looking at the same snapshot: the same test may potentially give a different result if referred to another year, why mixing then…
Do you think that this approach is reasonable?
Maybe I can add some text to explain better this choice.

In a facet grid plot of all countries, didn’t Zimbabwe have the greatest change (positive correlation) in life-expectancy?

Yes, I think Zimbabwe curve is different from all the others because of its history. The initial downward trend is unique in this dataset, as well as the drastic improvement starting from 2004-2005.
Regarding linearity it is something I forgot to mention explicitly, even if evident from pair plots: it may worth adding it.

One should avoid using archaic phrases like, ‘third world country’

I totally agree, using third world countries is a mistake.
My initial draft of the notebook used this term: I already noticed that it might suggest the idea of a hierarchy, and that’s the reason why I then opted for low_GDP and high_GDP as variable names.
They seems more objective as high/low GDP thresholds could be defined just by looking at numbers, without even knowing the name of a country.
Unfortunately I missed these last statements: thank you for spotting them!
I will replace third world countries with something referring to strong/weak economy, since this is the underlying concept of high/low GDP naming rule.

I think one needs to be aware that when saying things like, 'Chile probably records a better life … an improved infrastructure’ it can be interpreted as an assumption

The idea here was to attempt a guess, suggest a connection, with just a quick investigation: I was surprised to discover that Chile has a better life expectancy than USA.
I digged a bit to discover that Chilean healthcare is one of the best in South America, but anyway I was impressed.

I think you are right about objectiveness: I can leave it open, as a possible further development (suggesting cultural differences, eating habits and lifestyle as possible aspects to be investigated), rather than presenting those points as probable root causes.

First off, I think it’s great to have these sorts of conversations about data here in this forum! :slight_smile: I’m hoping to see more from others! So, again, thank you for sharing your work so we can have these sorts of discussions here.

This project is interesting to me because it’s a long-debated topic because GDP isn’t the only indicator in a country’s LEABY. There’s a definite correlation there, but not causation. And as data scientists and analysts we know that causation is definitely harder to prove!

There are other potential variables that could also be correlated with a country’s LEABY-----and you mention some of them: health care and nutrition. I’d also add: access to healthcare. In the US health insurance is tied to employment, and even then if one does have insurance [via employment or the ACA], it doesn’t mean they have access to it for example, also, income, access to food, access to information about health, nutrition, political stability, education, housing, environmental factors, sustainability of economies, etc. are all related to a population’s life expectancy. The varying degrees to which each is correlated is something for us all to explore.

The data itself is an estimation of LEABY see here.
So, I’m wondering in your analysis if a two-tailed t-test is even possible with the data? We don’t have a random sample population, we have actual aggregated population data & annual GDP data. (also, minor thing: the GDP is in the trillions, not billions. e+10). It’s something to think about.

There are so many possibilities to explore in this data and I think the project was a good example of EDA and data visualization.

This is a neat site with some cool animated visualization about GDP and LEABY:
This is a few years old, but another discussion about LEABY & GDP here.

And, some believe that GDP shouldn’t be in the discussion at all because it’s not an accurate indicator.

Happy coding!

Nice work- looks like you’ve done some really detailed data diving, keep it up and thanks for posting!

Creating a blog of your work is such a good idea- it was nice to read through the journey you’d taken here :grin: