U.S. Medical Insurance Costs Project (Need feedback!)

This is my first project, I would love some feedback on what went wrong with my majority info as well as possible solutions to the extra # I have at the bottom as well. Thanks in advance!
Here is the url:
https://github.com/britneylaz/U.S.-Medical-Insurance-Costs1

This file link is a local file on your computer…which we don’t have access to.

You can push the Jupyter Notebook to your GitHub repo and then share that link here.

https://github.com/britneylaz/U.S.-Medical-Insurance-Costs1

Hopefully this one will work, thanks!

Congrats on completing the project. :partying_face:

Seems like you understand functions and some Pandas methods. At what point in the path is this project? Before or after Pandas? (I forget, b/c there are several that use this data set. I ask b/c you started w/Pandas and then went on to define functions to analyze.)

Pandas has a ton of built in methods for EDA. (But, use whatever you’re comfortable with!)

Just some ideas…

To get the number of people in each region (as opposed to your ‘where_majority_from’ function) you could use Pandas methods:

insurance_regions = insurance['region'].value_counts()
print(insurance_regions)
#which results in:
southeast    364
southwest    325
northwest    325
northeast    324
Name: region, dtype: int64

You could also break out the data by region and analyze it from there using .iloc & .values like:

northeast = insurance.iloc[(insurance['region']=='northeast').values]
northeast.head()
#which results in this:
age	sex	bmi	children	smoker	region	charges
8	37	male	29.830	2	no	northeast	6406
10	25	male	26.220	0	no	northeast	2721
16	52	female	30.780	1	no	northeast	10797
17	23	male	23.845	0	no	northeast	2395
20	60	female	36.005	0	no	northeast	13228

Happy coding!