Hi, I’ve just finished this project and would appreciate any advice or suggestions on how it could be improved. Thanks in advance!
- This project was just right for my level
- It probably took me about 4 hours to complete
- My code repo can be found here
Thanks for coming to share your portfolio Project with other learners!
When posting your project for review, please be sure to include the following:
- Your review of the Project. Was it easy, difficult, just right?
- An estimate of how long it took you to complete
- The link to your code repo
Some things to consider:
-
Add your conclusions at the end of the notebook.
-
Rather than looking at the total amount of the charges, perhaps look at the median or mean. (median is probably more useful here as there are some outliers in the data set that pull the mean). Related: look at the median costs for smokers v. non-smokers.
#I'm using Pandas here.
df['charges'].describe().round(2)
count 1338.00
mean 13270.42
std 12110.01
min 1121.87
25% 4740.29
50% 9382.03
75% 16639.91
max 63770.43
This is interesting:
df.groupby(['region', 'sex', 'smoker'])['charges'].median().round(2)
region sex smoker
northeast female no 8681.14
yes 22331.57
male no 8334.46
yes 33993.37
northwest female no 7731.86
yes 28950.47
male no 6687.44
yes 26109.33
southeast female no 7046.72
yes 35017.72
male no 6395.95
yes 38282.75
southwest female no 7348.14
yes 34166.27
male no 7318.96
yes 35585.58
-
Other things to look at: How many total records are in the data? Were there any NULLs? How many women v. men are in the data? Smokers v. non? What were the median charges for the different groups of children? What region of the country had a higher median of charges? Etc.
-
I would ignore bmi (for reasons mentioned in the project description) and any subjective language surrounding it b/c it’s not an accurate measure of one’s health. (it’s a number for insurance companies to charge people higher premiums).
-
The last part with all the functions–is a little unclear as to what you want people to glean from the output.
-
I’m not sure what course you’re taking, or if EDA (exploratory data analysis) has been mentioned yet(?) If not, I recommend reading up on it as a guide for first steps with any data set.
A good start!
Thanks a lot for the feedback, you really are a super user!
1 Like