(x * 0.1) vs (x / 10)

Hi,

I’m doing the loops section of the python 3 course and I’m on the last part, which is the Jupyter book “Reggie_linear_regression_solution”

My work is on the right and the solution is on the left.
As you can see I have a different answer because I used (x/10) to create my list in increments of 0.1. Is there a reason this is wrong? since this way you don’t get the recurring 0’s like 3.00…4

Thanks Chase.

Are you saying it is or asking whether it is?

You might want to represent your data as integers instead of approximating them with floats, and you might also want to look at whether there were other candidates with similar rates of error.

Ignore this post if you’re using Python 3.x. I read a site referencing an older version of Python, because I checked on my computer and 15/10 = 1.5 just like you’d expect.


In Python (versions before 3.x), if you divide integers, it will round down to the nearest integer, so 15/10 = 1.

However, if you multiply an integer by a decimal, you will get a float, so 15*0.1 = 1.5.

You could also force a float with division by making 10 a float, so x/10.0 should work the same as x*0.1.

In python 3 those are both floating point operations, but they don’t do exactly the same thing. They do nearly the same thing, and if one considers that to be a problem then float is the wrong type to be using.

@ionatan Will you elaborate on the difference?

That they don’t converge into the same instruction, and since floats are approximations, doing two slightly different things that would be equivalent in exact math, should not be expected to have the same result, only approximately the same result.

It’s not the difference between two operations that matters, but the difference between any one instruction and what would happen if using exact operations. If doing approximations, expect an approximate result, right.

If you go to a paint shop and ask for blue, you’ll get blue. Great.
If you go to two different paint shops and ask each one for blue, you’ll get two blues. They’re both blue. They’re probably not the same blue. Saying “the same thing” but doing it through different routes led to two things that can be told apart from one another.

I was more wondering why in the solution they used X*10 if they wanted it to go up in steps of 0.1? Since dividing */10 is more accurate for what is being asked (make a list from -10 to 10 in increments of 0.1). I thought maybe there was a special reason. Either way I end up with a different answer at the bottom for best_x and best_y. If you compare the pics, solution is on the left, my answer on the right

If you measure the speed of two cars, you will get two numbers. You have to approximate those numbers, because your measuring instruments are not perfect.

If the cars are going at speeds 5.000000000004 and 5.000000000006
then you should not be saying that the second car is moving faster, you do not know that, this is beyond the accuracy of your measuring instruments.

You used operations involving approximations.
You’ll need to expect approximate results.
Look at the errors. They are both approximately 5. It would be wrong to say that one is a different error from the other, same as with the cars.

If you expected exact results, then you would need to do away with the approximations. You would need to not use floats.

They’re both wrong, their errors are not 0. What you have is multiple wrong predictions. Pick one among the least wrong ones.

The core misconception here is that floats’ purpose is to represent decimal numbers.
It’s not.
Its purpose is to approximate numbers, quickly, with limited space.

If wanting to represent decimal numbers then you should still be using integers or fractions. Or if you don’t care about it being exact, or maybe you are unable to represent it exactly, then you would want to approximate it, and that’s when you would use float.

Which version of Python are you using? Because if you are using 2.7 (which I believe the Codecademy article suggests you use) then my answer above applies, and 15/10 = 1. However, if you’re using version 3.x then it shouldn’t make a difference whether you write x/10 or x*0.1

Maybe the Reggie’s Linear Regression exercise was written with the old version, so it was important not to use the integer division.

It makes a difference.

That difference is so small that if you make a decision based on it, then that is a wrong thing to do.

The difference comes from doing something that isn’t exact, and if one is doing something that is not exact then one must not expect the result to be exact.

I don’t know how big the differences are between the answers chasenz is getting and the answers in the exercise.

I’m not arguing that there is no theoretical difference between the two methods (I’ll defer to you on that). I’m just saying that in this particular exercise, when we’re not dealing with very large or very small numbers, it probably won’t make a practical difference if they’re using Python 3.x.

It does if looking at the difference. Doesn’t matter how small it is.
When approximating one needs to not look at differences that are smaller than the accuracy that operations are carried out with. (Or better yet don’t approximate when not needing to, ie don’t use floats)

If making a big decision based on a small difference, then the difference becomes big.

This shows the five results with lowest error, obtained using both multiplying by 0.1, dividing by 10, and by using fractions and avoiding the approximations altogether.

From the exact results we can tell that there are four equally wrong candidates all having an error of exactly 5.

The floating point versions also find the same candidates, though with some distortion to error values and/or input values due to using a questionable data type to represent them.

On the flip side, the exact version is much slower. Floats have hardware support, they are fast.

multiply by 0.1
[res(err=4.999999999999999, m=0.30000000000000004, b=1.7000000000000002),
 res(err=5.0, m=0.4, b=1.6),
 res(err=5.0, m=0.5, b=1.5),
 res(err=5.0, m=0.6000000000000001, b=1.4000000000000001),
 res(err=5.1, m=0.4, b=1.5)]
divide by 10
[res(err=5.0, m=0.4, b=1.6),
 res(err=5.0, m=0.5, b=1.5),
 res(err=5.0, m=0.6, b=1.4),
 res(err=5.000000000000001, m=0.3, b=1.7),
 res(err=5.1, m=0.3, b=1.6)]
exact math
[res(err=Fraction(5, 1), m=Fraction(3, 10), b=Fraction(17, 10)),
 res(err=Fraction(5, 1), m=Fraction(2, 5), b=Fraction(8, 5)),
 res(err=Fraction(5, 1), m=Fraction(1, 2), b=Fraction(3, 2)),
 res(err=Fraction(5, 1), m=Fraction(3, 5), b=Fraction(7, 5)),
 res(err=Fraction(51, 10), m=Fraction(3, 10), b=Fraction(8, 5))]
from collections import namedtuple
from fractions import Fraction
from functools import partial
from operator import mul, truediv as div
from pprint import pformat


def flip(f):
    def flipped(a, b):
        return f(b, a)
    return flipped


def calculate_all_error(m, b, points):
    def error(point):
        x, y = point
        prediction = m * x + b
        return abs(y - prediction)
    return sum(map(error, points))


def run(strategy):
    return [
        res(calculate_all_error(m, b, datapoints), m, b)
        for m in range(-100, 101)
        for m in [strategy(m)]
        for b in range(-200, 201)
        for b in [strategy(b)]
    ]


datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
res = namedtuple('res', ['err', 'm', 'b'])
methods = [('multiply by 0.1', partial(mul, 0.1)),
           ('divide by 10', partial(flip(div), 10)),
           ('exact math', partial(flip(Fraction), 10))]

list(map(print, (
  desc + '\n' + pformat(sorted(run(strat))[:5])
  for desc, strat in methods
)))