U.S Medical Insurance Project, Regression Feedback

Hi everybody,
In this project, I created a regression model on insurance cost based on the parameters provided for each patient.
I used both linear regression (Ridge) and non-linear regression (Random Forest). I used the mean absolute error as the scoring parameter. What could be done to improve the accuracy of the model? Should I use outlier detection?
Do you agree with my findings regarding the importance of each feature?

Thank you very much for your time.

The link to the project repo is Medical_Insurance_Project/us-medical-insurance-costs.ipynb at 7c0bc2b6165faf4a2b50c5cf22816d20a27c6d81 · MatteoCaponi/Medical_Insurance_Project · GitHub

1 Like

This is awesome. Really like how you show at the end the relative importance of each variable. Did you have experience with tools like seaborn and scikit learn before this project? I was proud of myself for just implementing some basic pandas methods :laughing:

Yes, I took a college course on machine learning. It can definitely be challenging but it gets easier as you practice :+1:

1 Like