So I learned how to build a linear regression model manually in the “Build a Machine Learning Model” skill path, specifically in the “Supervised Learning: Regression” module, and more specifically in the “Linear Regression” lesson (https://www.codecademy.com/paths/machine-learning/tracks/regression-skill-path/modules/linear-regression-skill-path/lessons/linear-regression/exercises/introduction).
Using the information I learned in the lesson, I created the following code chunk:
import matplotlib.pyplot as plt
def get_gradient_at_b(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += (y_val - ((m * x_val) + b))
b_gradient = -(2/N) * diff
return b_gradient
def get_gradient_at_m(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += x_val * (y_val - ((m * x_val) + b))
m_gradient = -(2/N) * diff
return m_gradient
#Your step_gradient function here
def step_gradient(b_current, m_current, x, y, learning_rate):
b_gradient = get_gradient_at_b(x, y, b_current, m_current)
m_gradient = get_gradient_at_m(x, y, b_current, m_current)
b = b_current - (learning_rate * b_gradient)
m = m_current - (learning_rate * m_gradient)
return [b, m]
#Your gradient_descent function here:
def gradient_descent(x, y, learning_rate, num_iterations):
b = 0
m = 0
for i in range(num_iterations):
b,m = step_gradient(b, m, x, y, learning_rate)
return [b,m]
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]
#Uncomment the line below to run your gradient_descent function
b, m = gradient_descent(months, revenue, 0.01, 1000)
#Uncomment the lines below to see the line you've settled upon!
y = [m*x + b for x in months]
plt.plot(months, revenue, "o")
plt.plot(months, y)
plt.show()
And this code chunk does very well and generates me a nice looking scatterplot with a line of best fit:
Now, my problem arises when I get to the corresponding project to this lesson. In the project you are advised to use a prewritten regression model from the SKLearn library, which is fine, but I was curious to see if the one I had used in the previous lesson would work for this problem. So I tried to use the one codecademy had taught me to create myself, but the line wouldn’t generate, saying the B, and M were not numbers (nan). I was wondering why this is the case? I’ve been troubleshooting for a few hours and I feel I did a good job of implementing the model I was taught to make on the data from this project.
Here it is below, does anyone have an idea as to why this doesn’t generate me a line of best fit?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def get_gradient_at_b(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += (y_val - ((m * x_val) + b))
b_gradient = -(2/N) * diff
return b_gradient
def get_gradient_at_m(x, y, b, m):
N = len(x)
diff = 0
for i in range(N):
x_val = x[i]
y_val = y[i]
diff += x_val * (y_val - ((m * x_val) + b))
m_gradient = -(2/N) * diff
return m_gradient
def step_gradient(b_current, m_current, x, y, learning_rate):
b_gradient = get_gradient_at_b(x, y, b_current, m_current)
m_gradient = get_gradient_at_m(x, y, b_current, m_current)
b = b_current - (learning_rate * b_gradient)
m = m_current - (learning_rate * m_gradient)
return [b, m]
def gradient_descent(x, y, learning_rate, num_iterations):
b = 0
m = 0
for i in range(num_iterations):
b,m = step_gradient(b, m, x, y, learning_rate)
return [b,m]
df = pd.read_csv('honeyproduction.csv')
prod_per_year = df.groupby('year').mean().reset_index()
X = prod_per_year['year']
X = X.values.reshape(-1, 1)
y = prod_per_year['totalprod']
b, m = gradient_descent(X, y, 0.01, 1000)
y_predictions = [m*x + b for x in X]
plt.plot(X,y,'o')
plt.plot(X,y_predictions)
plt.xlabel("Years")
plt.ylabel("HoneyBee Production Per Year (Lbs)")
plt.show()
It just spits the scatterplot out with no line.