Why is my median incorrect?

I can’t figure out where I’m going wrong. Any ideas?

TypeError: list indices must be integers or slices, not float.

I’ve typed out your code and ran it on three sample lines…

def median(numbers):
    numbers = sorted(numbers)
    if len(numbers) == 1:
        median = numbers[0]
    elif len(numbers) > 1 and len(numbers) % 2 != 0:
        median = numbers[len(numbers) / 2 - 1]
    elif len(numbers) > 1 and len(numbers) % 2 == 0:
        median = (numbers[len(numbers) / 2] + numbers[len(numbers) / 2 - 1]) / 2.0
    return median
print (median([2]))
print (median([2, 3, 4, 5, 6]))
print (median([2, 3, 4, 5, 6, 7]))

The first one gave a result of,

2

The second one raised the exception. Commented out so the third one could run and we get the same exception.

What doe the error point to?

numbers[len(numbers) / 2 - 1]

This might not raise an exception in Python 2 since an integer divided by an integer gives an integer quotient. Not so in Python 3 which returns a float from all division. Let’s cast those expressions to int() for the sake of testing in Python 3 and see if that cures this one issue.

def median(numbers):
    numbers = sorted(numbers)
    if len(numbers) == 1:
        median = numbers[0]
    elif len(numbers) > 1 and len(numbers) % 2 != 0:
        median = numbers[int(len(numbers) / 2) - 1]
    elif len(numbers) > 1 and len(numbers) % 2 == 0:
        median = (numbers[int(len(numbers) / 2)] + numbers[int(len(numbers) / 2)]) / 2.0
    return median
print (median([2]))
print (median([2, 3, 4, 5, 6]))
print (median([2, 3, 4, 5, 6, 7]))

Output

2
3     # ?
5.0   # ?

Okay, so we know the function works as expected (almost).

That brings us to the question, which version are you working in? What errors are being raised if you are running Python 2?

A critical analysis of this code would begin with a D. R. Y. inspection. Is there any repetition?

what error do you get? Could you copy paste your code? Working from a screenshot is very difficult

I’m working in Python 2, which, as you mentioned, would mean this division method should work. My error message is “median([6, 8, 12, 2, 23]) returned 6 instead of 8”, but I’m not sure why that would be the case. What is a D.R.Y. inspection, and how can I do it here? It doesn’t seem to me like I’m repeating anything.

Here’s the copied and pasted version:
def median(numbers):
numbers = sorted(numbers)
if len(numbers) == 1:
median = numbers[0]
elif len(numbers) > 1 and len(numbers) % 2 != 0:
median = numbers[len(numbers) / 2 - 1]
elif len(numbers) > 1 and len(numbers) % 2 == 0:
median = (numbers[len(numbers) / 2] + numbers[len(numbers) / 2 - 1]) / 2.0
return median

If you’ll note in the example above I placed a ? beside two of the answers. They are neither one the median of their respecitve input list.

Don't Repeat Yourself

On the contrary, there is a lot of repetition, as in repeated code patterns.

numbers[len(numbers) / 2]

for example. What if we abstract that away and assign its value to a variable?

n = len(numbers)

and for good measure, abstract away the division, as well…

m = n / 2

for Python 2, or,

m = n // 2

for Python 3 (which also works in Python 2). That is known as floor division which will result in an integer if both dividend and divisor are integer.

Now the above repeated expression becomes,

def median(numbers):
    numbers = sorted(numbers)
    n = len(numbers)
    m = n // 2
    if n == 1:
        median = numbers[0]
    elif n > 1 and n % 2 != 0:
        median = numbers[m - 1]
    elif n > 1 and n % 2 == 0:
        median = (numbers[m] + numbers[m - 1]) / 2.0
    return median

Notice the difference?

Actually, I just noticed my own omission in the earlier code. The missing -1 on the last case. My bad. So only the odd length median is incorrect. How would we correct that?

median = numbers[m - 1]

becomes,

median = numbers[m]

That solves the median error so that 4 is the result, not 3.

One final concern is the first line.

numbers = sorted(numbers)

That is assigning a sorted copy of numbers back onto the same variable. It’s the same as,

numbers.sort()

But this, and the above, are both modifying the input list, which means the original is now sorted. We may not want that. So,

s = sorted(numbers)

Now the original is left as is and we have a sorted copy assigned to s. This will now make the code look like,

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    if n == 1:
        median = s[0]
    elif n > 1 and n % 2 != 0:
        median = s[m]
    elif n > 1 and n % 2 == 0:
        median = (s[m] + s[m - 1]) / 2.0
    return median

That makes the code easier to read and understand; would you agree?

Still, we have one more area of redundancy; can you point it out?

Give up? That’s okay. Just absorbing what we’ve done so far can be quite a chore. But, it is a necessary part of seeing what effect our code has. And, it is practically a protocol when it comes to expressing what we want with the code.

Many of the things we write as statements have their basis in an expression.

 s = sorted(numbers)

The above is a statement since it carries out a command in the program. If we wrote the same thing in the command line, assuming numbers exists, we would have an s object at our disposal, just sitting in memory.

An expression is not a command, and therefore not a statement. It is a value. sorted(numbers) is an expression. len(s) is an expression. n % 2 == 0 is an expression. They all boil down to a value. We can only assign values to variables.

Computers don’t store expressions, they evaluate them and store the values that result. When we assign that value to a variable, it becomes a reference much like a rolodex would be where we of old would store business cards. The address on the card is where we could find the individual. A variable is that card.

In debugging and refactoring we are looking for all the ways our code is expressing itself. That includes the logic we are using since it is evaluating expressions. Is something longer than this; or is this odd? We look for where those predicates are no longer needed or can be bypassed.

Consider,

    if n == 1:
        median = s[0]

Is that test even needed? When we weigh it against the test for evenness it comes out odd. The math that finds the middle element of the list comes up with, 0. That would suggest a test for unit length is unnecessary. We can erase those two lines with no effect on the outcome.

That brings around the other component of this redundancy, testing for length greater than 1. The above pretty much rules out that need so we can further reduce our code by removing,

n > 1 and

When that is said and done we have,

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    elif n % 2 != 0:
        median = s[m]
    elif  n % 2 == 0:
        median = (s[m] + s[m - 1]) / 2.0
    return median

Is that where we stop? Not on your life! There are still redundancies. Do we need the final elif? It is, after all just an else case when we boil it down. If it’s not odd, it’s even.

Consider also that we don’t need to compare to zero. Just determine if a value is non-zero, which is actually easier…

if n % 2:

Done deal, assuming n is a number that can be divided by 2.

Now we get,

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    if n % 2:
        median = s[m]
    else:
        median = (s[m] + s[m - 1]) / 2.0
    return median

Looking pretty slim, now, eh? We’ve torn our original code to shreds and made it as sparse as it can be, just so we know it expresses itself in as succinct a fashion as possible.

Personally, I’m not keen on having a variable with the same name as the function so I would change that, or just write in return. Either approach is fine.

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    if n % 2:
        return s[m]
    else:
        return (s[m] + s[m - 1]) / 2.0

Do we stop here? Of course not; especially if there is still some fat to trim! We forge on…

Since we are returning, do we even need the else? No. That can go.

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    if n % 2:
        return s[m]
    return (s[m] + s[m - 1]) / 2.0

For the advanced learner (or the apt pupil, either or) we can resort to what Python equates to a ternary expression under the guise of a conditional, and we have,

def median(numbers):
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    return s[m] if n % 2 else (s[m] + s[m - 1]) / 2.0

Because this function did not interact with anything it is a pure function. Something we want to aim for in all our functions, going forward.

Ah, thank you very much for your help! This is great. I believe the last area of redundancy can be eliminated by substituting a variable for n%2. So inserting something like p = n%2 before the if statement

1 Like

That expression only appears once in the finalized solution so is not redundant. There would be need to create another variable.

I can’t think of a way to substitute for another variable…the only thing I can think of is perhaps replacing

elif n > 1 and n % 2 != 0:
median = s[m]
elif n > 1 and n % 2 == 0:
median = (s[m] + s[m - 1]) / 2.0

with

elif n > 1:
if n % 2 != 0:
median = s[m]
else:
median = (s[m] + s[m - 1]) / 2.0

I don’t know if you can do that, though, and it doesn’t seem to be the answer you described.

We don’t need to check the length as long as we are sure the list is not empty. This tells us the only thing to check is for an empty list.

n = len(s)
if n == 0: raise IndexError

is the same thing, essentially, as,

 if n % 2:

Oh, I see. Thank you!

1 Like

Pushing out of the envelope, now, we can begin to qualify the inputs as something our program can understand and use, or brush them aside. Here is some code with warts and all, but it shows how we can handle some of the exceptions that may be raised when inputs don’t match what the function expects to work with.

>>> def median(numbers, *n):
    try:
        y = numbers[0]
    except IndexError:
        return "Empty list Error"
    except TypeError:
        return "Not a list object"
    except KeyError:
        return "dict is not a list"
    s = sorted(numbers)
    n = len(s)
    m = n // 2
    try:
        return s[m] + 0 if n % 2 else (s[m] + s[m - 1]) / 2.0
    except TypeError:
        return "Not a Number Error"

>>> median(5,6)
'Not a list object'
>>> median(5)
'Not a list object'
>>> median([5])
5
>>> median([5,6])
5.5
>>> median({})
'dict is not a list'
>>> median([])
'Empty list Error'
>>> median(())
'Empty list Error'
>>> median('a')
'Not a Number Error'
>>> a = lambda x: x
>>> median(a)
'Not a list object'
>>> 

To explain the first line,

>>> def median(numbers, *n):

I added in the spread on the second parameter so that it wouldn’t raise a TypeError when multiple arguments are passed in. The function is meant to accept only one positional argument but there is no way to trap multiples. In this case, I just let them in. Now I can use TypeError to mean a different thing.

Another odd thing is that 'a' is a sequence with length 1. It makes it past the y = numbers[0] because it is enumerable; being as it is a sequence it has an index and since it has a length greater than zero it skips past this test. It later gets caught when we try to do some math with it.

I only post this now because it happens to be on my radar as something to work on, so here it is. Bookmark it and come back when you get to the unit on handling exceptions. It will make a lot more sense then.