Here is my first portfolio project. Writing the code itself wasn’t very difficult and didn’t take much time, but getting the document cleaned up, making comments and describing my findings was time-consuming (also because English is not my first language). In total, it took me about 10 hours to complete the portfolio project.
good use of comments; it’s easy to follow along in your analysis.
functions are succinct & well written to extract pertinent info from the data set.
Some things to consider:
Rather than the mean of charges, it might be better to calculate the median–b/c there are some outliers that will pull the mean. If you look at the median, then the Northeast has the highest charges, followed by the Southeast.
See:
insurance['charges'].mean().round(2)
>>13270.42
insurance['charges'].median().round(2)
>>9382.03
Or,
insurance['charges'].describe()
>>count 1338.000000
mean 13270.422265
std 12110.011237
min 1121.873900
25% 4740.287150
50% 9382.033000
75% 16639.912515
max 63770.428010
insurance[['region', 'charges']].groupby('region').median().round(2)
>>> charges
region
northeast 10057.65
northwest 8965.80
southeast 9294.13
southwest 8798.59
Keep in mind that bmi is a made up number for insurance companies to charge people more $$$ for premiums. It’s not an accurate measure of one’s health b/c it doesn’t take into account other factors about an individual (it can’t differentiate between muscle, lean body mass, or water weight. It also doesn’t take into consideration health behaviors or body composition.) So, be mindful of using terms like “healthy” or “normal”, or “underweight”, etc. to describe weight in your analysis. It’d be better to say that, “x number of ppl fall into this range of numbers, this range, etc”.
This is a project that you’ll come back to and use Pandas and Seaborn or Matplotlib as you progress thru the course and then you can just import the csv file and create a dataframe and go from there.