Python 3 list comprehension question

Fellow Python gurus :wink:

I’ve been doing a project in the Python 3 course and was getting slightly unexpected results. After careful comparison to the solution notebook I’ve tracked down where the issue is coming from but I can’t understand why the slight difference in code would generate different outcomes. It all comes down to the following list comprehension:
My code:
possible_ms = [m /10 for m in range(-100, 101)]
Solution’s code:
possible_ms = [m * 0.1 for m in range(-100, 101)]

The list generated from either of these lines looks the same but when applied to subsequent calculations seems to produce different results. I must be missing something simple here! Any ideas?

5 Likes

Hello @harveymanfrenjensend :slightly_smiling_face: ,

From what I know, Codecademy only has answer even though that is correct could be my guess.

5 Likes

I think you’re right but when the list output is applied to subsequent calculations it produces slightly different results! I’m wondering if it’s something to do with Python 3’s handling of integer vs floating.

7 Likes

Do they get the same answer anyway?

1 Like

No. And I can replicate their answer in my notebook if I use their suggested
possible_ms = [m * 0.1 for m in range(-100, 101)]

but not if I use my m/10 method.

5 Likes

Which means that you got a wrong answer in the end. At least you found out why :slightly_smiling_face:

5 Likes

haha. Yes, some kind of answer is better than no answer. :smile:

5 Likes

Floats are approximations, it is therefore incorrect to expect an exact result

6 Likes

Actually looking at the notebook, the results appear significantly different, but if there’s no bug then they both fit the data points approximately equally well.

If exact results were expected then floats shouldn’t have been involved anywhere.

6 Likes

Could you post a link to the exercise in question?

4 Likes

It’s a file really, a jupyter notebook
https://www.codecademy.com/courses/learn-python-3/informationals/python3-reggies-linear-regression

…first thing I did was convert it to plain code, because, what the … ? why would people use notebooks I can’t…

$ jupyter nbconvert --to script Reggie_Linear_Regression_Solution.ipynb
4 Likes

Good point. I suppose I’m just trying to understand why the small difference in code produces predictably different results. I just find it intriguing.

4 Likes

Actually, the result is the same as far as you’re supposed to care. There isn’t a small difference. If you think there’s a difference, don’t use floats.

Where you get a “different” result it’s because the difference between the two options were so small that either meets the requirement.

You could view it as that there are multiple solutions. And/or there may be more solutions if you only look for something that is approximately correct.

If for any reason it matters which approximate solution you get, then that’s a bug. Caring about that difference is saying floats aren’t supposed to be used.

Think of it like colors. Two very slightly different shades of red. Which one is more red? You can’t tell. Do you care which one you use? If one of them is fair trade then maybe you should write that on the lid of the paint bucket rather than demanding that the brighter one be used.

If this sounds at all error prone (it should) then you should also actively avoid floats whenever you don’t actually have need for them.
As an example, you could simply not divide. Say that a whole number represents a tenth instead. Adjust the data points to match. You’ve now got exact operations throughout.
Or use fractions, it represents tenths but you don’t actually divide.
[Fraction(m, 10) for m in ...]
result:
3/10 17/10 5
(you would probably find the other result too, if you were to keep all results with the smallest difference to the datapoints)

4 Likes

Thanks for the detailed explanation. Very helpful and a warning to me not to flagrantly abuse floats. I like your Fractions approach…

3 Likes

This exercise is an eye opener.

(We’re trying to fit data to a line: m * x + b here)

Using m * 1/10 yields the “results” tuple several calculations later:
best_m, best_b, smallest_error = 0.4, 1.6, 5.0
… while using m * 0.1 gives:
best_m, best_b, smallest_error = 0.30000000000000004, 1.7000000000000002, 4.999999999999999

Then, taking these results down to the output, the m * 0.1 “solutions” result ultimately leads to a prediction that a 6 cm ball will bounce 3.5 cm, while the m * 1/10 calculation leads to a calculated bounce of 4.0 cm, close to a 15% difference resulting from the choice of 1/10 vs 0.1!

Bottom line: Everything @ionatan said, squared (at least.) When you enter the realm of Floating Point Arithmetic, beware, for there be dragons!

3 Likes

Isn’t it misleading to say there’s a 15% difference? They point in different directions, but they both miss the target by as much. So it’s more like they’re both 7.5% wrong and they’re equally wrong, so it doesn’t matter which is picked - the difference between them is zero. Both results can be found when using exact math - the difference doesn’t come from using floats.

5 best:

[(Fraction(3, 10), Fraction(17, 10), Fraction(5, 1)),
 (Fraction(2, 5), Fraction(8, 5), Fraction(5, 1)),
 (Fraction(1, 2), Fraction(3, 2), Fraction(5, 1)),
 (Fraction(3, 5), Fraction(7, 5), Fraction(5, 1)),
 (Fraction(3, 10), Fraction(8, 5), Fraction(51, 10))
] 
5 Likes

I don’t follow: What things point in different directions?

1 Like

The values for m and b

5 Likes

OK, thanks. It’s not immediately clear that those would compensate. I’ll ruminate on this one a bit.

1 Like

I mean, both of these are equally good solutions:

(Fraction(3, 10), Fraction(17, 10), Fraction(5, 1))
(Fraction(2, 5), Fraction(8, 5), Fraction(5, 1))

But neither fully matches the data points.
As predictions, they are both wrong. Equally wrong.

So by picking one over the other, it wouldn’t be 15% wrong as a result of that choice.
Regardless of which is picked, it would be wrong by the same amount as the other.

4 Likes