Hi Gareth,
Really great code! Very clean and easy to follow, much more than mine. I like your geography based approach to insurance charges, that would have many real life uses.
Some suggestions: it’d be good to create a few more connections between the different metrics and insurance charges. For example, you’ve tied charges to geography, but is it a simple matter that being in the south or east means you get charged more? Could it be that the number of smokers in a certain area, or other factors like average BMI, are the confounding variable that influence costs? It might be worth proving that smoking really does lead to higher costs (which would be the final piece of the puzzle, since you’ve already proven that the SE has the most smokers), or other facts like BMI or number of children are at play here.
If that is the case, it would answer the extension question about the possibility of biases in the data, since it’s the makeup of the geographical population, and not the geography itself, that leads to higher costs.
Just some thoughts on how you can extend it, the use of graphs is really great and the general approach is on point.
Tolly