Hi, all this is my attempt at the insurance project. As this was my first time and a little nervous, I did not find a partner for this attempt.
As I have now had an attempt and feel more confident when hitting a wall and troubleshooting it, I am really looking forward to the next project and collaborating with my fellow students.

project link: Insurance_Project

Really great code! Very clean and easy to follow, much more than mine. I like your geography based approach to insurance charges, that would have many real life uses.

Some suggestions: it’d be good to create a few more connections between the different metrics and insurance charges. For example, you’ve tied charges to geography, but is it a simple matter that being in the south or east means you get charged more? Could it be that the number of smokers in a certain area, or other factors like average BMI, are the confounding variable that influence costs? It might be worth proving that smoking really does lead to higher costs (which would be the final piece of the puzzle, since you’ve already proven that the SE has the most smokers), or other facts like BMI or number of children are at play here.

If that is the case, it would answer the extension question about the possibility of biases in the data, since it’s the makeup of the geographical population, and not the geography itself, that leads to higher costs.

Just some thoughts on how you can extend it, the use of graphs is really great and the general approach is on point.


Many thanks for the reply. I absolutely agree. I did find as I was working through this additional questions were coming up and upon reading your advice I feel that I should have given those questions a fair time as well.
Thank you for the advice and I am currently looking at what you suggested and are there more factors as to why the southeast is costs higher (including BMI and amount of children).

