I just finished part 1 of the U.S Medical Insurance project. I’m approximately 42% through the Business Intelligence Data Analyst Career Course which has some Data Scientist classes. This was an interesting starter project, and I may have bit off more than I should have based on the sample within the course. I’m still early in the career path so there are no visuals like charts. I’m looking for some feedback regarding readability, and efficiency within the code.
Link: GitHub - mrswesosky/CodeAcademyProjects
Thank you in advance for your time and feedback!
@micro8957773496 looks like you adopted the example solution in the intial creation of lists. But I like the way you extended them, converting to float values and most importantly your analysis and the explanation really stood out. I guess that is something I could have done better.
Would you please have a look at mine and provide your feedback: pub/us-medical-insurance-costs.ipynb at main · SakthiPillai/pub · GitHub
Thank you for your feedback! I did adopt the initial example for the lists as I figured it would be good to reference back to it later if needed. The analysis had a few hiccups there during the testing stage so i’m glad it stood out in a good way in the completed project.
For your project:
I like the use of functions for the various information you needed for the analysis since that would be easily scalable. You pulled a lot of great info, and I appreciate that you built out the functions in the order of your analysis outline. I definitely want to get more analysis points like you did when I circle back to this project later.
I recommend removing your testing rows. It’s good to test the code but after it is confirmed working if we aren’t using the test it ends up making the project page look busier than it needs to. If they are meant for not just testing but for analysis, I would highlight the analysis over stating it is a test.
I’m not sure that individual dictionaries for the youngest and oldest smokers would be needed by themselves, but it could be really informative to put together a smoker dictionary similar to how you did the Regionwide Dictionary.
The avg age dictionary and the avg insurance cost dictionary will likely have an issue providing the correct average for with children accounts since the code is only pulling the average for those with 1 child rather than all accounts with children which includes a range of 1 to 5 kids.
The Regionwide Dictionary has an error in the patients with children count as well as the total children count. There are only 764 accounts with children which have 1465 children total across those accounts. The functions for obtaining those counts appear to instead be pulling the number of accounts total in that Region instead of the intended counts. They are pulling the same count for each datapoint which is what drew my attention (i.e. for the Southwest your code is showing: patients_with_children’: 325, ‘total_children’: 325). I checked against the CSV, and the correct counts for the Southwest region would be 187 patients with children, and 371 total children. Unfortunately, I haven’t learned how to utilize dictionaries and key/values as input for a function yet so I’m not sure what to recommend to fix the count.