Dividing by Int Not the Same as Multiplying by Float?

Currently going through the Analyze Financial Data with Python course and noticed a small difference in my code for an offline project vs the CC code that made a pretty big difference in the end result.

The focus of the project (Reggie’s Linear Regression) is running a linear regression on the datapoints list. Here’s my code:

def get_y(m, b, x): return (m*x + b) def calculate_error(m, b, point_tuple): x_point, y_point = point_tuple est_y = get_y(m, b, x_point) return abs(est_y - y_point) def calculate_all_error(m, b, points): error = 0 for point in points: error += calculate_error(m, b, point) return error possible_ms = [i/10 for i in range(-100, 101)] possible_bs = [i/10 for i in range(-200, 201)] datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)] smallest_error = float("inf") best_m = 0 best_b = 0 for m in possible_ms: for b in possible_bs: error = calculate_all_error(m, b, datapoints) if error < smallest_error: smallest_error = error best_m = m best_b = b # if abs(error - 5) < 0.001: # print(m, b, error) print(best_m, best_b, smallest_error)

This produces a best intercept of 1.6, a best slope of 0.4, and a smallest error of 5.

The only difference in the CC code is with the list comprehensions. Where I divided i by 10, they multiplied by 0.1.

This changed their final results to produce a best intercept of 1.7, a slope of 0.3, and a smallest error of 4.9 repeating.

So…why is this and how should I deal with this in the future?

As a side note, if you uncomment my little debugging block, you’ll see that 1.7 and 0.3 actually produce an error of 5.00000000001 in my version.

This whole thread might be of interest to you-

There’s two major issues here, one is floating point accuracy and the second is that the dataset is far too small and scattered for a good linear regression fit. Both fits are likely equally wrong in terms of errors (at least within the given level of accuracy) if that helps :slightly_smiling_face:.

2 Likes