Reggie's linear regression project

Hi folks,

Firstly Spoilers Ahead
This post contains solution code to part 2 of Reggie’s Linear Regression project so if you don’t want to see the solution please scroll no further!

After some head-scratching and trial-and-error, i finally managed to write some code which produced the correct answers for part 2 of Reggie Linear Regression project.

However, upon comparing my code to the solution code, there is one simple aspect that I don’t understand.

My code:

datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
best_m = 0
best_b = 0
def try_slopes_and_intercepts(possible_ms, possible_bs):
    smallest_error = float("inf")
    for m in possible_ms:
        for b in possible_bs:
            if calculate_all_error(m, b, datapoints) < smallest_error:
                best_m = m
                best_b = b
                smallest_error = calculate_all_error(m, b, datapoints)
    print(best_m, best_b, smallest_error)

try_slopes_and_intercepts(possible_ms, possible_bs)

Solution code:

datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
smallest_error = float("inf")
best_m = 0
best_b = 0

for m in possible_ms:
    for b in possible_bs:
   	 error = calculate_all_error(m, b, datapoints)
   	 if error < smallest_error:
   		 best_m = m
   		 best_b = b
   		 smallest_error = error
       	 
print(best_m, best_b, smallest_error)

Ok, I overcomplicated the problem by making a function, however, the thing bugging me is that I MUST define smallest_error as float(“inf”) INSIDE my function. Why?
For example, in the code below:

datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
best_m = 0
best_b = 0
smallest_error = float("inf")
def try_slopes_and_intercepts(possible_ms, possible_bs):
    for m in possible_ms:
        for b in possible_bs:
            if calculate_all_error(m, b, datapoints) < smallest_error:
                best_m = m
                best_b = b
                smallest_error = calculate_all_error(m, b, datapoints)
    print(best_m, best_b, smallest_error)

try_slopes_and_intercepts(possible_ms, possible_bs)

this results in

UnboundLocalError: local variable 'smallest_error' referenced before assignment

Why am i not able to define smallest_error before I define my function?

Hi Dan,

This happens because of variable scopes. Once you define the variable inside your function, then python treats it as a local variable, separate from the global variable which you defined earlier, even if it has the same name.

And then when you call the function, it sees that you referenced the local variable before you defined it, so it gives you an error.

You can use global smallest_error to declare the global variable inside the function before you reference it.

You can read more about scopes here.

1 Like

Whilst you could use the global keyword to solve this I’d be careful about using it since its purpose is to allow changes to what that name references. In the given example a second call to that function wouldn’t even start with the same value (smallest error would’ve been modified) which risks some unexpected behaviour. If, perhaps, you had extra code to further minimise smallest_error there’s an argument for maintaing it’s value but I’m not sure if using globals would be the best way to handle that.

As for the original query on why it seemed like it had to be added to the function… It’s because you have an assignment statement to a variable with the same name within that function. Without the assignment you could use smallest_error inside the function and it’d read the value from the outer scope instead, this is standard behaviour. However, this behaviour changes because you have a statement assigning values to a local variable named smallest_error within the function which means the look-up order ‘sees’ the local variable first and therefore the earlier expression with stuff < smallest_error will throw the error you see (it tries to use the local variable).

I’d suggest instead either moving it inside the function where it’ll be re-initialised on every function call or adding it as a new function parameter (perhaps a default would suit?) which might also be a nice way to handle further attempts to reduce smallest_error for the same dataset.

2 Likes

Yes, definitely, the way I suggested is not the best if subsequent calls to the function should always expect the original value of the global variable.

While we take this behavior for granted, now that I think about it, it seems strange that if python by default takes a reference to a variable from the outer scope if it’s not in the local scope, then why doesn’t it subsequently continue to consider the variable non-local if there’s an assignment to it afterwards?

It seems obvious that if someone wanted a local variable, they’d declare it first before referencing it. While if they wanted to reference a non-local variable, then they’d reference it before assigning to it. Is this exception just meant to handle the cases where someone forgot to declare a local variable before referencing it?

1 Like

WIthout anything to say otherwise assignment is treated locally. Using explicit statements to state otherwise (global / nonlocal) is an effort to try and maintain readability. At a quick glance in a longer script is that variable local or not? At least with a global keyword at the top of the function definition you could find out quickly that it isn’t just modifying something locally. Use of global is fairly uncommon but then so is the use of module level constants.

Note that mutable types can (regardless of whether or not they normally should) be altered, but not reassigned, outside the local namespace-
def can_i_alter_a():
   a.append(a[-1] + 1)


a = [1, 2, 3]
print(a)
can_i_alter_a()
print(a)
can_i_alter_a()
print(a)

That particular exception is just a straightforward case of using a variable before assigning it e.g. print(x) before anything was ever assigned to x. Since there is an assignment to a local name smallest_error when the function is defined (which is a statement itself) it’s treated as a local variable and referncing it before it is assigned throws an error when that function is run (I’m not familiar enough with the bytecode to explain it at a lower level than that- load_fast vs. load_global would probably be the place to start if you wanted to look into it).

1 Like