# US Medical Insurance Costs Solution

Hello community,

I wish to share my solution for this project and would love any constructive criticism. I tried not to solve it using pandas or lumpy which was the challenging part as I was concerned about repetition. I am sure there are many ways to refine my code and maybe even my approach to solving it.
It took me few days to solve this project and I found it somehow challenging since I tried to be flexible with the arguments to give more specific output and not too repetitive at the same time.

Thank you,

Firstly, I would suggest putting some information about your project in the README file. Such information being the main idea of your project, what you are trying to solve, what type of data is it, etc. Your code looks great and those results make sense! So good job. Maybe consider adding some graphs for visualization.

Secondly, in the introduction you kind of just jump into the code, without giving your viewers a feel of the data. Like, what does the data record?

Thirdly, (slight bias) you define all your class functions first, then call them at the end. I feel like splitting each function up into their own section, and explaining what you are going to do in each section would be easier to follow.

Also, you donâ€™t really have a clear goal defined. Your project is just answering miscellaneous questions that wouldnâ€™t be helpful (in a sense).

A problem that comes with answering random questions is that you kind of gloss over important data groups. Maybe the average cost for female smokers is higher in the Northwest region compared to the average cost in the Southeast region. Maybe female smokers with 5 children pay more than female non-smokers with 5 children.

Some possible goals could be:

1. â€śHow do certain factors affect the charge amount?â€ť
2. â€śWhat is the impact of certain factors on charge amount for smokers compared to non-smokers?â€ť

Then your analysis is answering these goals, so your research is organized in that front. Making a goal is also beneficial for the viewer, because Iâ€™m looking at your analysisâ€¦ so what? Why does it matter that the average age is 39 years?

When analyzing, you have to consider how this information is going to help to answer (usually) company problems. In regard to goal #2, that information could help inform smokers that your insurance cost will be higher if you smoke- possibly leading to them not smoking anymore. This impact would be especially helpful to a company that wants to reduce the number of underage smokers. They could add this information to their ad campaigns.

Lastly, I suggest adding a summary at the end. This helps to summarize the important points the user can take from your analysis.

3 Likes

Youâ€™re right, in that it would be more efficient (easier in my mind) to use Pandas and NumPy to do this project. (at least, thatâ€™s what I would use. Iâ€™m set to do this project shortly & have been hesitant to do so).

To expand on what @h1lo said aboveâ€¦Data Scientists (and Analysts) are storytellers. In addition to discovery & testing hypotheses you not only inform but often persuade your audience (whoever they may be. Knowing your audience is another thing to be aware of when presenting infoâ€“especially if itâ€™s technical info to a non-technical audience).

While you wouldnâ€™t do your presentation/storytelling in your notebook, itâ€™s still a good idea to describe your thought processes while exploring the code. (It can seem obvious to you b/c youâ€™re entrenched in it, but to the casual observer, theyâ€™re lost). A short sentence before every cell of code would be enough in addition to questions/hypotheses at the beginning and summary of findings at the end.

Hereâ€™s some articles on data storytelling:

https://www.dezyre.com/article/why-data-scientists-need-to-be-good-data-storytellers/174

https://towardsdatascience.com/storytelling-for-data-scientists-317c2723aa31

2 Likes