Hello fellow Codecademy user,
This was the first portfolio project I have completed in the Codecademy Data Science: Analytics path. The recommended analysis ideas were very simple. For the extended analysis I tried to do something similar to Reggie’s Linear Regression project, but instead of 2 dimensions, there are 7 dimensions to the medical data!
This took me about a day of coding to complete and can be seen here GitHub - samWyatt/codecademy_projects.
My two questions are:
-
I think I have found the optimal factors for the linear insurance cost estimate function, how can I be sure? How do I know I’m not at a local minimum of the total error function instead of the actual minimum? Can you find a better set of factors?
-
I have two approaches to looping through all the possible factor combinations. One uses a series of nested for loops and the other uses itertools.product. Which one is better?
Thanks for your time!
Hi lisalisaj,
Thanks for reviewing my project! I’m still a couple of units away until I get to data visualization, but I will come back to this project and add a line chart showing the decrease in total error over time.
I think a scatter plot would be very difficult for this project since there are 7 independent variables, but I could definitely point out which variables have the greatest effect on the cost estimate.
Finally, you are totally right about adding clarification on the regression process. Right now, the project relies too heavily on the reader’s knowledge of linear regressions, so to make it more accessible I will add an explanation of the process.
Thanks again lisalisaj
1 Like
You’re welcome.
Also, keep in mind with regards to correlation, there are generally a few factors that affect one’s health insurance premiums, or charges: age, tobacco use, region of country, dependents on the plan, and the level of coverage.
Hi lisaliasj,
I’m sorry this took so long, but I have made some major updates to this project in terms of data visualization and concept explanation. The new file is called ‘us-medical-insurance-costs-v2-1Region.ipynb’ on my Git Hub page here: GitHub - samWyatt/codecademy_projects
I added an ‘introduction to linear regression’ section for those who are unfamiliar with the topic. I also removed the overly long explanations of the more technical aspects to improve readability.
I would appreciate some feedback when you get the chance,
Thank you