Not sure if findings/code is right

Hello and welcome!

I’m a student of Codecademy. Currently I’m doing the ‘Data Analysis’ Career Path. For the course I finished this project, called “US Medical Insurance Costs” in which I made some calculations/findings.:

You can open the project here

Doing the project was just about right, not too difficult and not too easy.

I’m happy for every feedback I get! To be more precise, here are some questions:

Is my finding about the age of people with at least 1 child correct?
Any recommandations for learning statistics/math (I’m pretty bad at it).
Thank you for everything in advance!

PS: sorry for my english, I’m swiss.

Congratulations on completing the project! :partying_face:

As you found out, there’s a lot one can do with this data set.
If you wanted to, there are some pandas methods you could use rather than creating functions to explore the data (though functions are totally fine! But I figure why not use methods if they’re already available?)

insurance.describe()
will give you the count, mean, std, min, 25%, 50%, 75% and max values for all columns in the df. Or, you could use that on any one column too:
insurance['bmi'].describe()

If you wanted to delve into the regions separately…you could also separate out the regions with something like this:
`southwest = insurance.iloc[(insurance[‘region’]==‘southwest’).values]’

using .iloc and .values which will find all the rows that match “southwest” or, whatever you specify.

You could separate out the data by any values, really. By number of children:

one_child = insurance[insurance['children'] == 1]

then use one_child.describe()

Pandas documentation here: https://pandas.pydata.org/docs/user_guide/index.html

I like this explanation of EDA. It’s pretty extensive.
https://towardsdatascience.com/an-extensive-guide-to-exploratory-data-analysis-ddd99a03199e

Happy coding!

1 Like

Hey thank you very much for your detailed answer! I had a feeling, that I’m doing something too complicated.

I had a hard time to get the values out of the DataFrame and/or Series. I tried it with .iloc, but that didn’t function. And also the .get_value() or ._get_value() didn’t worked as well.

And I’m gonna dig deeper into the Pandas Documentary :slight_smile:

1 Like