FAQ: Linear Regression - Loss

This community-built FAQ covers the “Loss” exercise from the lesson “Linear Regression”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Loss

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Why is Loss measured against the X-axis and not the actual line?

1 Like

Hi, can anyone help why my list comprehension is invalid?

total loss1 = sum([(y_value -y_predicted) ** 2 for y_value in y for y_predicted in y_predicted1])

Is there a space or an underscore in total_loss1? I don’t spaces are allowed. Try that?

1 Like

Thanks, you’re right. However, it wouldn’t be correct even with the underscore, because my list comprehension is like 2 for loops embedded inside each other, which is not what we need here.

My first thought was also nested loops.

Remember that each feature has to have the same number of dimensions. This conveniently allows us to use the same for loop over both lists.

hi guys i had a question for number 2 if you could help me pls

2.

Find the y values that the line with weights m2 and b2 would predict for the x-values given. Store these in a list called y_predicted2

thx guys :grinning: :100:

There is a simple solution. Use a zip in the for loop:
for value1, value2 in zip(value_list1, value_list2):

with this you can value from list2 that corresponds to value from list1 (with respect to index)
NOTE: This doesn’t work if the length of both lists are not the same.

OR
you can use index loop:
for i in range(len(value_list):

with this you can use the index corresponding to the first value to get the corresponding value from the other list.

1 Like

Let’s consider why we want to calculate loss along the y-axis instead of the x-axis. Measuring the loss along the x-axis gives the opposite result of this exercise, where line 1 fits the data better.

x_predicted1 = [(1 / m1) * y_value - b1 / m1 for y_value in y]
x_total_loss1 = 0
for i in range(len(x)):
  x_total_loss1 += (x[i] - x_predicted1[i]) ** 2

x_predicted2 = [(1 / m2) * y_value - b2 / m2 for y_value in y]
x_total_loss2 = 0
for i in range(len(x)):
  x_total_loss2 += (x[i] - x_predicted2[i]) ** 2

print(x_total_loss1, x_total_loss2)  # 17.0 54.0

If there is no qualitative difference between x-axis and y-axis of a given data, then both results should be treated equally. However, in linear regression there are often qualitative differences such as explanatory and objective variables.

If there is a qualitative difference between the x-axis and the y-axis, I think we need to consider which axis’s data may contain errors. For example, if we examine a correlation between body weight and blood cholesterol level, suppose we assign weight to the x-axis and blood cholesterol level to the y-axis. In this case, the y-axis is likely to contain errors, so we will want to minimize the loss along the y-axis.