Here is my portfolio project. It seems basic but I just wanted some feedback on the coding as well as the analysis. Overall the project was pretty basic. Nothing too challenging occurred and it was a good chance to brush up on some information.
I couldn’t get the Jupyter notebook to save as a PDF so I will just include my code below.
import pandas as pd
insurance = pd.read_csv("insurance.csv")
print(insurance.head())
import numpy as np
ave_age = np.average(insurance.age)
ave_bmi = np.average(insurance.bmi)
ave_children = np.average(insurance.children)
ave_charge = np.average(insurance.charges)
print(ave_age)
print(ave_bmi)
print(ave_children)
print(ave_charge)
pivot_sex = insurance.pivot_table(index = ['sex'],
values = ['charges'],
aggfunc = {'mean', 'std'})
print(pivot_sex)
pivot_region = insurance.pivot_table(index = ['region'],
values = ['charges'],
aggfunc = {'mean', 'std'})
print(pivot_region)
insurance['age_range'] = ['0-19' if age < 20 else '20-29' if age < 30 else '30-39' if age < 40 else '40-49' if age < 50 else '50-59' if age < 60 else '60-69' if age < 70 else '70+' for age in insurance.age]
print(insurance.head())
pivot_age = insurance.pivot_table(index = ['age_range'],
values = ['charges'],
aggfunc = {'mean', 'std','count'})
print(pivot_age)
from matplotlib import pyplot as plt
plt.scatter(insurance.age, insurance.charges)
plt.show()
print(np.corrcoef(insurance.age, insurance.charges))
plt.scatter(insurance.bmi, insurance.charges)
plt.show()
print(np.corrcoef(insurance.bmi, insurance.charges))