U.S. Medical Insurance Costs Project (Need feedback!)

This is my first project, I would love some feedback on what went wrong with my majority info as well as possible solutions to the extra # I have at the bottom as well. Thanks in advance!
This file link is a local file on your computer…which we don’t have access to.

You can push the Jupyter Notebook to your GitHub repo and then share that link here.


Hopefully this one will work, thanks!

Congrats on completing the project. :partying_face:

Seems like you understand functions and some Pandas methods. At what point in the path is this project? Before or after Pandas? (I forget, b/c there are several that use this data set. I ask b/c you started w/Pandas and then went on to define functions to analyze.)

Pandas has a ton of built in methods for EDA. (But, use whatever you’re comfortable with!)

Just some ideas…

To get the number of people in each region (as opposed to your ‘where_majority_from’ function) you could use Pandas methods:

insurance_regions = insurance['region'].value_counts()
#which results in:
southeast    364
southwest    325
northwest    325
northeast    324
Name: region, dtype: int64

You could also break out the data by region and analyze it from there using .iloc & .values like:

northeast = insurance.iloc[(insurance['region']=='northeast').values]
#which results in this:
age	sex	bmi	children	smoker	region	charges
8	37	male	29.830	2	no	northeast	6406
10	25	male	26.220	0	no	northeast	2721
16	52	female	30.780	1	no	northeast	10797
17	23	male	23.845	0	no	northeast	2395
20	60	female	36.005	0	no	northeast	13228

Happy coding!