Now I’m working on Python Portfolio Project. Aside from recommended tasks I tried to implement a simple linear regression analysis to figure out how BMI values are related to the values of insurance charges, where BMI is an independent variable (x) and insurance costs are dependent one (y).
First, I’ve tried to write a function to calculate the slope of the estimated regression line (𝑏₁ coefficient) based upon the following formula:
Here is my code:
bmi = np.array(bmis).reshape(-1, 1) charges = charges_array #Function to find parameter b1 def find_b1(bmi, charges): residuals_sum = 0 sum_squared = 0 for x in range(len(bmi)): residual_x = bmi[x] - np.mean(bmi) for y in range(len(charges)): residual_y = charges[y] - np.mean(charges) residuals_sum += (residual_x * residual_y) sum_squared += np.square(residual_x) b1 = residuals_sum / sum_squared return b1 print(find_b1(bmi, charges)) #Output [-9960.4426389] #Check with numpy whether results match from sklearn.linear_model import LinearRegression model = LinearRegression().fit(bmi, charges) b1 = model.coef_ b1 #Output array([393.8730308])
If you have any idea why the results differ and where I could have made a mistake, please give me a clue.
Thanks in advance!