Medical Insurance Cost and Other Population Attributes Analysis

This was an exceedingly fun project. I found myself thinking about all the possible ways in which I could investigate the data we were given. I couldn’t figure out (yet!) a way to plot numerical population attributes (such as age or charges) against region and gender specifically and in that order (for example, the code has no problem plotting region vs. charges). I was thinking I could add an if-else control structure for the measure variable to set up a way to calculate the frequency with which these attributes appear, but perhaps it would be better to define a different function (or a class even) for this purpose. Apart from this setback I’m rather satisfied with the way my code turned out, although I have a nagging feeling that the code could output a wrong median for some attribute.

As for why I picked the median instead of the average, I think I must’ve picked this story from a freakonomics radio episode; consider a train filled with 180 people, with an average BMI of 25. If someone with a BMI of 50 gets on the train, the average BMI will jump from 25 to 25.3 approximately (since 25 = sum_of_bmis/180, you just have to add 50/180=0.277 to 25). Now, imagine Bill Gates gets on board that same train. The average annual income in the U.S is $53,820, while the annual income of Bill Gates is $4,000,000,000. If you perform the same calculation as before, the average annual income of the people on board that train jumps from $53,820 to $22,276,042! So the average is not a great way to measure a population attribute when the difference between the minimum and maximum value is too great. If you perform the same thought experiment with the median (the mid point element of a list), Bill Gates getting on the train won’t change the median income because the extreme values are excluded (btw, the median annual income in the US was around $31,099 in 2018).

This project took me about three days to complete, and now that I’ve got it in a mostly working condition I’ll be glad to check out what other people did with their code. I would be grateful for any and all feedback concerning numerical errors and presentation. Cheers and happy holidays!

1 Like