US Medical insurance project review + specific questions

I found the Medical Insurance project to be very open, so I was able to make it as challenging as I wanted it to be.

I decided to answer the following:

  • Calculate the average age of the people in the data set
  • Rank the areas by representation in the data set
  • Calculate how much more the average smoker pays in insurance than the average non-smoker
  • Find average age for someone with at least one child in the data set
  • Find average number of children men have vs. average number of children that women have
  • Answer if women or men pay more on average for health insurance and how much
  • Find a line of best fit for predicting a woman’s BMI based on her age

My questions:

  • How is my presentation of information? Did I provide enough information about my goals, or should I have given more explanations / used more markdown to make it readable?

  • Do the questions I chose to answer make sense/are they worthwhile to ask?

  • I was able to calculate a rough line of best fit, but I didn’t figure out how to actually plot that specific line on the graph I made. Instead, I followed a guide online for finding and displaying a line of best fit using matplotlib and numpy, which wasn’t exactly the same as the line if best fit I calculated (mine was an approximation). How can I get my line of best fit to display on my graph similar to how it’s done in the section Plot the points and line of best fit ?

  • I relied a lot on creating variables set to zero and updating those variables. For example, in the calculation of the avg number of children for men and women, I have the following:
    num_men = 0
    num_men_children = 0
    men_child_count = 0

    num_women = 0
    num_women_children = 0
    women_child_count = 0
    Is there some better way to do this, or is creating and updating variables the most efficient method for calculating averages with a dataset like this?

My project can be found below. Any feedback is welcome—I’m a total beginner! :slight_smile:

I think your code is tidy and neat!

Slightly off-topic, I thought of doing a similar thing with scatterplot and finding the best fit for age vs bmi, but the plot was too scattered and so I thought the line of best fit was probably meaningless. Instead, what I did was divide the bmi group into subgroups (underweight, healthy, overweight, and obese), calculate the average age for each subgroup and then plot as a line graph. I think that made more sense than the scatterplot. Maybe something to consider!

1 Like

Awesome feedback! I appreciate the idea :slight_smile:

Silly question perhaps, but did you plot the graph with age on the x axis or the y axis?