US Medical Insurance Portfolio Project - Please Review :)

Hi everyone,

Here is my first portfolio project. Writing the code itself wasn’t very difficult and didn’t take much time, but getting the document cleaned up, making comments and describing my findings was time-consuming (also because English is not my first language). In total, it took me about 10 hours to complete the portfolio project.

The link to my code: GitHub - dgordok/Codecademy

I would appreciate it very much if you could leave a review.

Thank you in advance!
Danil

1 Like

Congrats on completing the project.

  • good use of comments; it’s easy to follow along in your analysis.
  • functions are succinct & well written to extract pertinent info from the data set.

Some things to consider:

  • Rather than the mean of charges, it might be better to calculate the median–b/c there are some outliers that will pull the mean. If you look at the median, then the Northeast has the highest charges, followed by the Southeast.
    See:
insurance['charges'].mean().round(2)
>>13270.42

insurance['charges'].median().round(2)
>>9382.03

Or,
insurance['charges'].describe()

>>count     1338.000000
mean     13270.422265
std      12110.011237
min       1121.873900
25%       4740.287150
50%       9382.033000
75%      16639.912515
max      63770.428010

insurance[['region', 'charges']].groupby('region').median().round(2)

>>>         charges
region	
northeast	10057.65
northwest	8965.80
southeast	9294.13
southwest	8798.59


  • Keep in mind that bmi is a made up number for insurance companies to charge people more $$$ for premiums. It’s not an accurate measure of one’s health b/c it doesn’t take into account other factors about an individual (it can’t differentiate between muscle, lean body mass, or water weight. It also doesn’t take into consideration health behaviors or body composition.) So, be mindful of using terms like “healthy” or “normal”, or “underweight”, etc. to describe weight in your analysis. It’d be better to say that, “x number of ppl fall into this range of numbers, this range, etc”.

This is a project that you’ll come back to and use Pandas and Seaborn or Matplotlib as you progress thru the course and then you can just import the csv file and create a dataframe and go from there.

Good work!

1 Like

Thank you very much for the review! You mentioned important points that I will consider in my next project. Thanks! :slight_smile:

1 Like

Hi :smiley:

Comment from a newbie here so this is more helpful for me to practicing on commenting other projects :smile: than for you :sweat_smile:

Likes:

  • Very clear, easy to follow your analysis and workflow.
  • I very much enjoy and learn from you the font format in the Markdown space to highlight key values.
  • It is interesting for me to see the unique value list from the Region list, and your analysis surrounding the regions.
  • Good reference on CDC
  • Good summary section (I really should have it in all projects in the future)

Others:

  • The only thing I would (or able to :sweat_smile:) update is separator for thousand number

Thanks so much for your work :laughing: Have a nice day ! :sun_with_face:

1 Like

Thank you for your comments! :grinning: Have fun creating and reviewing other projects in the future! :slight_smile: