Seeking Feedback on US Medical Insurance Portfolio Project: The Mystery of the Three Price Bands!

project:

A little about me:
I am a software product manager taking the data scientist and ML career path on Codecademy. I’m not trying to change careers, just trying to expand my skills, learn, and have fun. I am 27% done this track which I started 9 weeks ago. My last CS class before this was C++ in 2002. I also took an online databases class in 2012, and work regularly with data structures in my job, particularly with relational database schema design to support new products and features.

This project was fun. It was very opened ended so I set out exploring basic statistics using with what I had already learned from the Codecademy path for opening files and working with dictionaries. Then I wanted to visualize the data. We haven’t learned matplotlib or pandas yet in the path, so I relied heavily on documentation, searching, and bard to get ideas and debug. I also experimented with fitting linear models to understand the relationship of variables to costs.

My first aha! moment was when I found 3 clear cost bands in the data:

Could I figure out what they represented? Not entirely, but I made good progress. Please take a look. I appreciate any feedback.

Side note: looking at some other projects, reviewers seem to really appreciate comments and stories that go along with what you’re trying to do and why, so before posting I went back and added details about my methodology.