US Medical Insurance Project - Please Review!

Hello, and thank you for reviewing my code!

I started the Data Science Path because we did a half-semester lesson on Python last year in school. I was super interested in it, so I decided to learn more. Even with that brief background, this career path has always challenged me! And this project was no exception.

This portfolio project took me about 4 days to do, working on the project on and off for about 1hr 30min each day (I get distracted sometimes lol).

I didn’t have too much trouble coding it, but I was kind of at a loss for how to analyze what my code was saying. It’s much different from my other classes. Tips on how I could have improved on that would be super helpful!

Thanks again,

mews_mochi

Do you have a .py or .ipynb file to view?

Sorry, I was trying to figure out how to share files through GitHub but I guess I messed it up.

http://localhost:8888/files/US%20Medical%20Insurance%20Proj?_xsrf=2|c37ddddb|3cef9a06bceab113e381beb8c5d2719f|1688766539

Tell me if this one works!

No, that one won’t work b/c it’s local on your computer.

In the repo on GH, you can select “add file” and then locate the .py file or the Jupyter Notebook and upload it that way.

What about this one?

1 Like

Yep, that one worked.

Congrats on finishing the project. This is one that you’ll return to as you accumulate more python skills.

Some thoughts:

  • Good job on writing the functions and using the csv module.

  • That said, there are only 1338 rows in the csv file, so I think you need to double check the values for men & women in the dataset. Or, maybe this is an issue of re-running the code cell in the notebook.(You have: Count for female: 1324, Count for male: 1352)

  • Good use of comments and describing your thought processes as you sift through the data.

  • b/c there are outliers in the data you might want to use median rather than mean when looking at the charges column.

  • You also might not want to print out the (lengthy) results of the columns, b/c it’s a lot to scroll through. Maybe just return the results for your own use but not show them in the notebook.(?)

See:

df['sex'].value_counts()

male      676
female    662


df['charges'].mean().round(2)
13270.42

vs:

df['charges'].median().round(2)
9382.03

Or:
df.describe()
         age	       bmi	     children	  charges
count	1338.000000	1338.000000	1338.000000	1338.000000
mean	39.207025	30.663397	1.094918	13270.422265
std	14.049960	6.098187	1.205493	12110.011237
min	18.000000	15.960000	0.000000	1121.873900
25%	27.000000	26.296250	0.000000	4740.287150
50%	39.000000	30.400000	1.000000	9382.033000
75%	51.000000	34.693750	2.000000	16639.912515
max	64.000000	53.130000	5.000000	63770.428010

An aside, if you don’t want to use Jupyter Notebook, look into Google Colab. It’s built like Jupyter, but your files are in your Drive. There’s a menu option in the dropdown File menu to push a copy of the notebook directly to a GH repo (you just have to select the correct repo when doing so).

Good work! Keep at it. :technologist:

1 Like

Hi! Thank you so much for your input,

I just have a question about the values in the male and female datasets. When I use len() to look at the length of the two dictionaries, it adds up. Maybe I’m looking at a different part than you?

Also, for the bottom example code, are you using pandas there? I just started using pandas so I don’t know a whole lot about it, but I’m going through that part of the career course now.

You are a lifesaver,

mews_mochi

1 Like

But, if there are only 1338 records/rows in the data set, you can’t have 1324 women and 1352 men. It doesn’t add up. You function is doubling the numbers. So, go back and check that (indentation).

df.info()

>><class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
df["sex"].value_counts()

>>male      676
female    662

Yep, that’s :panda_face: