The link to my project is https://github.com/TashenM/U.S.-Medical-Insurance-Costs-Portfolio-Project.git
It was daunting at first but eventually became fun to make this project my own. It took me about a week to complete but took my time with it. I found this project easier the further along I got with it.
Please check it out and give me your feedback on what I could’ve done better or what you thought was good.
Congrats on completing the project. You put a lot of work into it.
the readme file made me chuckle at the “TLDR;” part. You could always add what you put at the end of your notebook in the readme file. Including where you got the data from (Kaggle).
goals are clearly stated at top of notebook.
good use of comments so anyone viewing the notebook can follow along as you analyze.
if you’re more comfortable creating classes, then do so. But, you’ve imported Pandas which is a pretty powerful library w/a ton of built in functionality…and it hasn’t really been used.
#.describe() will give you basic stats about the data:
>>age bmi children charges
count 1338.000000 1338.000000 1338.000000 1338.000000
mean 39.207025 30.663397 1.094918 13270.422265
std 14.049960 6.098187 1.205493 12110.011237
min 18.000000 15.960000 0.000000 1121.873900
25% 27.000000 26.296250 0.000000 4740.287150
50% 39.000000 30.400000 1.000000 9382.033000
75% 51.000000 34.693750 2.000000 16639.912515
max 64.000000 53.130000 5.000000 63770.428010
- But, remember that outliers will affect the mean and skew results. So, it might be better to check for the median.
- you can use
value_counts() to find the totals in a column,
>> southeast 364
df[['sex', 'region', 'charges']].groupby('region').median()
- further, smokers and non-smokers can be pulled out of the data set --using
df.iloc --and analyzed separately too if you’re so inclined:
non_smokers = df.iloc[(insurance['smoker']=='no').values]
>>>age sex bmi children smoker region charges
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
5 31 female 25.740 0 no southeast 3756.62160
Sorry, that was a bit long-winded. I guess I am a Pandas advocate.
Thank you for your feedback @lisalisaj
Looking back on it now, I don’t know why I didn’t put the “TLDR” part in the readme file. I will change that!
With respect to the Pandas built in functionality: I did originally make use of most of the examples you illustrated here and I was not very comfortable making use of classes at the time.
I ended up changing my mind because I felt the need to dive into using classes based off of what I had planned using some of the Pandas built in functionality to get out of my comfort zone for this project.
I really do appreciate your response and taking the time to look at my project! Thanks again!