Hi everybody,
In this project, I created a regression model on insurance cost based on the parameters provided for each patient.
I used both linear regression (Ridge) and non-linear regression (Random Forest). I used the mean absolute error as the scoring parameter. What could be done to improve the accuracy of the model? Should I use outlier detection?
Do you agree with my findings regarding the importance of each feature?
Thank you very much for your time.
The link to the project repo is Medical_Insurance_Project/us-medical-insurance-costs.ipynb at 7c0bc2b6165faf4a2b50c5cf22816d20a27c6d81 · MatteoCaponi/Medical_Insurance_Project · GitHub