Reggie's Linear Regression - What data to use?

For the list comprehension I used two approaches to gather the values. One was using numpy’s arange and the other was the way the solution did it. They both calculate almost the exact same values, with the difference being miniscule. However when I plug these into same function, they produce almost the same error but very different ‘m’ and ‘b’ values. I made the step size smaller to try and increase the accuracy but it is still different. Why does this happen? In a real world situation, which is correct?

import numpy as np
possible_ms1 = [x for x in np.arange(-10, 10.1, .01)]
possible_ms2 = [m * 0.01 for m in range(-1000, 1001)]

possible_bs1 = [x for x in np.arange(-20, 20.1, .01)] #your list comprehension here
possible_bs2 = [b * 0.01 for b in range(-2000, 2001)]

datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
smallest_error = float(‘inf’)

for m in possible_ms1:
for b in possible_bs1:
error = calculate_all_error(m, b, datapoints)
if error < smallest_error:
smallest_error = error
best_m = m
best_b = b

print(smallest_error, best_m, best_b)

datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
smallest_error = float(‘inf’)

for m in possible_ms2:
for b in possible_bs2:
error = calculate_all_error(m, b, datapoints)
if error < smallest_error:
smallest_error = error
best_m = m
best_b = b

print(smallest_error, best_m, best_b)

5.000000000003109 0.6599999999997728 1.3400000000033359
4.999999999999999 0.26 1.74

It just seems that those sets of m and b values are “equally” good, and several others are too.

Here’s another version of those list comprehensions:

possible_ms = [n/10 for n in range(-100, 101)]
possible_bs = [n/10 for n in range(-200, 201, 1)]

And using that you have:

model: y = 0.3x + 1.7 has error = 5.000000000000001
model: y = 0.4x + 1.6 has error = 5.0
model: y = 0.5x + 1.5 has error = 5.0
model: y = 0.6x + 1.4 has error = 5.0

(all of these actually have error 5, the issue is some inaccuracies in computations with floating point numbers … perfect accuracy is impossible due to space limitations and binary stuff)

The solution uses

possible_ms = [m * 0.1 for m in range(-100, 101)]
possible_bs = [b * 0.1 for b in range(-200, 201)]
1 Like