I found the Medical Insurance project to be very open, so I was able to make it as challenging as I wanted it to be.
I decided to answer the following:
- Calculate the average age of the people in the data set
- Rank the areas by representation in the data set
- Calculate how much more the average smoker pays in insurance than the average non-smoker
- Find average age for someone with at least one child in the data set
- Find average number of children men have vs. average number of children that women have
- Answer if women or men pay more on average for health insurance and how much
- Find a line of best fit for predicting a woman’s BMI based on her age
My questions:
-
How is my presentation of information? Did I provide enough information about my goals, or should I have given more explanations / used more markdown to make it readable?
-
Do the questions I chose to answer make sense/are they worthwhile to ask?
-
I was able to calculate a rough line of best fit, but I didn’t figure out how to actually plot that specific line on the graph I made. Instead, I followed a guide online for finding and displaying a line of best fit using matplotlib and numpy, which wasn’t exactly the same as the line if best fit I calculated (mine was an approximation). How can I get my line of best fit to display on my graph similar to how it’s done in the section Plot the points and line of best fit ?
-
I relied a lot on creating variables set to zero and updating those variables. For example, in the calculation of the avg number of children for men and women, I have the following:
num_men = 0
num_men_children = 0
men_child_count = 0num_women = 0
num_women_children = 0
women_child_count = 0
Is there some better way to do this, or is creating and updating variables the most efficient method for calculating averages with a dataset like this?
My project can be found below. Any feedback is welcome—I’m a total beginner!