Machine Learning Version - Multiple Regression! US. Medical Insurance

Hi fellow students,

I actually performed multiple regression model on US medical insurance cost project!
It took me probably 3 hours to complete this since I had to modify here and there as I read more resources online.
I realized that I still have to study aloooot but I enjoyed so much that I want to jump into another project right away.
I’m pretty sure alot of you guys feel the same way! I’m so excited to share my work and ideas with you guys!!

Hi @feelzoo,

Congrats on finishing up, glad to hear you enjoyed your work, that’s always a good sign. Mixing projects and study is a great way actually practice what you’re learning and, as you seem to have noticed, give you a little more motivation to continue learning.

It’s perfectly reasonable to look back and improve on an old project when you’ve learned something that could improve it and running through online resources whist you work is just par for the course :grinning:.

For a bit of feedback-
It may be worthwhile spending a little more time to introduce your dataset. Imagine you were looking at this without prior knowledge; would you be able to understand what was being analysed, and why, in the first few sentences? When presenting an analysis like this always consider how your viewer would interpret it; try to make it is as easy as possible for them to understand.

I also think this work would be greatly improved if you could further your analysis to draw some conclusions from the data, see the original portfolio project details if you wanted some ideas. Your addition of a model to try and predict the insurance costs for an individual is interesting and worthy of discussion so perhaps add some in. Details and inference about how well the model predicts the data, do different variables affect the model more, are there any flaws or potential improvements to be made would add a lot to this work.

You’ve done the analysis, now you just need to let your viewer know what you found.

1 Like

Impressive project, congratulations!

It was really interesting to see how you structured and conducted your analysis.

Regarding the project, I am yet to immerse into machine learning, but I have a dummy question:

Is it reasonable to use linear regression for binary data type? Here I mean M/F vs cost linear regression.

One stylistic comment that hits is that you need to update the plot title as everywhere is " Correlation Between Sex and Insurance Charges".

Congratulations and keep learning!

1 Like

Hi tgrtim,

Thank you so much for your feedback. After I saw your post, I was a little embarrassed by the fact that my work wasn’t up to the minimum requirement. I think I was rushing too much to share what I did and communicate with you guys about it. I’ve modified project by adding more explanation on each section or chapter as if I were an audience in light of your feedback. Here is my newest update! Thank you again for your help!

Hi andz3j,

I’ve updated the plot tile appropriately! Thank you for spotting that for me!

To your great (using a word ‘dummy’ on a question about a dummy variable! pun intended?) question, my reasoning was that my response is numeric numbers (not classification problem) and as long as a variable is statistical significant, linear regression would be a safe choice without considering other options. There might a better ML method for binary types!

Thank you for again!

1 Like

Hi andz3j,

After I read more about finding an association between variables, both scatter plots and linear regression graphs are not appropriate between categorical variables and quantitative variables. Thank you for your great question! Now, I can see the graph is adding any values to audiences! I still think getting a r_squared value is useful to see if there’s any linear relationship though. I need to update my project again!