Thanks, you’re right. However, it wouldn’t be correct even with the underscore, because my list comprehension is like 2 for loops embedded inside each other, which is not what we need here.
There is a simple solution. Use a zip in the for loop:
for value1, value2 in zip(value_list1, value_list2):
…
with this you can value from list2 that corresponds to value from list1 (with respect to index)
NOTE: This doesn’t work if the length of both lists are not the same.
OR
you can use index loop:
for i in range(len(value_list):
with this you can use the index corresponding to the first value to get the corresponding value from the other list.
Let’s consider why we want to calculate loss along the y-axis instead of the x-axis. Measuring the loss along the x-axis gives the opposite result of this exercise, where line 1 fits the data better.
x_predicted1 = [(1 / m1) * y_value - b1 / m1 for y_value in y]
x_total_loss1 = 0
for i in range(len(x)):
x_total_loss1 += (x[i] - x_predicted1[i]) ** 2
x_predicted2 = [(1 / m2) * y_value - b2 / m2 for y_value in y]
x_total_loss2 = 0
for i in range(len(x)):
x_total_loss2 += (x[i] - x_predicted2[i]) ** 2
print(x_total_loss1, x_total_loss2) # 17.0 54.0
If there is no qualitative difference between x-axis and y-axis of a given data, then both results should be treated equally. However, in linear regression there are often qualitative differences such as explanatory and objective variables.
If there is a qualitative difference between the x-axis and the y-axis, I think we need to consider which axis’s data may contain errors. For example, if we examine a correlation between body weight and blood cholesterol level, suppose we assign weight to the x-axis and blood cholesterol level to the y-axis. In this case, the y-axis is likely to contain errors, so we will want to minimize the loss along the y-axis.