US Medical insurance feedback

Hello hoping to hear some feeback.

this was an interesting project.
hoping to add more features to the project in the long run.

thanks !

Some thoughts:

  • It might be a good idea to include a .readme file or at least an introduction that cites the data source & your initial questions. This EDA should read like a story with an intro, analysis, conclusions.

  • You have a solid understanding of how to write functions. If you can, limit the output of items in the dictionary. The wall of text is a LOT to view.

  • Might be better to do some descriptive stats first and rather than looking at the mean of the charges column, look at median. There are outliers in the data that pull the mean.

See:

df['charges'].describe().round(2)
count     1338.00
mean     13270.42
std      12110.01
min       1121.87
25%       4740.29
50%       9382.03
75%      16639.91
max      63770.43
  • Be mindful of using subjective language in your analysis when describing categories of people. Loaded terminology, or, biased language like “Normal weight”, “Severely underweight”, “obese” is subjective. Using terms like that comes off as a value judgment. Analysts are ultimately supposed to be objective in their analyses of data. (Also understand that BMI is a controversial number and isn’t an accurate measure of one’s health (basic research would reveal that)). Personally, I’d ignore that variable and focus on others like possible regional differences, age, sex, etc.