This is my first project, I would love some feedback on what went wrong with my majority info as well as possible solutions to the extra # I have at the bottom as well. Thanks in advance!
Here is the url:
https://github.com/britneylaz/U.S.-Medical-Insurance-Costs1
This file link is a local file on your computer…which we don’t have access to.
You can push the Jupyter Notebook to your GitHub repo and then share that link here.
Congrats on completing the project.
Seems like you understand functions and some Pandas methods. At what point in the path is this project? Before or after Pandas? (I forget, b/c there are several that use this data set. I ask b/c you started w/Pandas and then went on to define functions to analyze.)
Pandas has a ton of built in methods for EDA. (But, use whatever you’re comfortable with!)
Just some ideas…
To get the number of people in each region (as opposed to your ‘where_majority_from’ function) you could use Pandas methods:
insurance_regions = insurance['region'].value_counts()
print(insurance_regions)
#which results in:
southeast 364
southwest 325
northwest 325
northeast 324
Name: region, dtype: int64
You could also break out the data by region and analyze it from there using .iloc
& .values
like:
northeast = insurance.iloc[(insurance['region']=='northeast').values]
northeast.head()
#which results in this:
age sex bmi children smoker region charges
8 37 male 29.830 2 no northeast 6406
10 25 male 26.220 0 no northeast 2721
16 52 female 30.780 1 no northeast 10797
17 23 male 23.845 0 no northeast 2395
20 60 female 36.005 0 no northeast 13228
Happy coding!