Median problem


#1
def median(nums):
numbers = sorted(nums)
if len(numbers) % 2 == 0:
    n = int(len(numbers) / 2)
    o = n-1
    total = numbers[n] + numbers[o]
    avg = int(total / 2)
    return avg
else:
    p = int(len(numbers) / 2 + 0.5)
    return numbers[p]

why doesn't this work? median([4, 5, 5, 4]) returned 4 instead of 4.5

shouldn't numbers[n] and numbers [o] return 4 and 5?


#2
avg = float(total) / 2

For starters.

n = int(len(numbers) / 2)

Not necessary to convert to an integer since it already is one.


#3

doesn't an integer become a float once it's divided?


#4

When both dividend and divisor or integer. the quotient is also an integer. len() is an integer, and 2 is counting number (also integer)

int / int == int

float / int == float
int / float == float

#5

that makes life easier. why was i returning the wrong value, though?


#6

Because you were returning an INT.

Consider,

4 + 5 == 9
9 / 2 == 4
9.0 / 2 = 4.5

so,

float(number[0] + number[1]) / 2

will be a float.


#7

def median(x):
    if not isinstance(x, list): return False
    k = len(x)
    if k == 0: return False
    elif k == 1: return x[0]
    x.sort()
    n = k / 2                        # n is an integer
    if k % 2 > 0: return x[n]
    return float(x[n-1] + x[n]) / 2  # return is a float

This is not meant as lesson code, just a distillation of yours with a couple of validation checks. The one check that is missing is the check for all numbers in the list.

We can do the check any number of ways. This is one way I came up with just now...

import re

def median(x):
    if not is_nums(x): return False
    k = len(x)
    if k == 0: return False
    elif k == 1: return x[0]
    x.sort()
    n = k / 2
    if k % 2 > 0: return x[n]
    return float(x[n-1] + x[n]) / 2    

def is_nums(x):
    if not isinstance(x, list): return False
    h = [ n for n in x if re.match(r'\d',str(n))]
    return len(h) == len(x)

Since sorting the list will sort the global object, this is not a pure function. Things get mutated that are not local to the function. We can make it a little more pure by taking a shallow copy of the input list, and work on it, thereby leaving the global object unaffected.

This line,

x.sort()

gets replaced by,

y = sorted(x)

or

y = x[:]
y.sort()

then these two lines,

    if k % 2 > 0: return x[n]
    return float(x[n-1] + x[n]) / 2

become,

    if k % 2 > 0: return y[n]
    return float(y[n-1] + y[n]) / 2

The hand-off comes with a slight semantics revision to stave off any questions...

def median(x):
    if not is_nums(x): return False
    k = len(x)
    if k == 0: return False
    elif k == 1: return x[0]
    else: pass
    y = sorted(x)
    n = k / 2
    if k % 2 > 0: return y[n]
    return float(y[n-1] + y[n]) / 2

#8

is_nums redux

from numbers import Number
def is_nums(x):
    if not isinstance(x, list): return False
    h = [ n for n in x if isinstance(n, Number)]
    return len(h) == len(x)

#9

@mtf

Question, shouldn't you just check if the list is empty first or if you are even getting a valid object before attempting to run a function on it? Also if things were moved around a little couldn't you keep it from wasting time with unnecessary calculations?

I provided an example of what I mean below.

Example:

from numbers import Number


def is_nums(x):
    if not isinstance(x, list): return False
    h = [ n for n in x if isinstance(n, Number)]
    return len(h) == len(x)

def median(x):
    if not x: return False  # check if empty here so we don't have to call a function if empty
    if not is_nums(x): return False  # Then check if the bugger if full of numbers
    k = len(x)
    if k == 1: return x[0]
    y, n = sorted(x), int(len(x) / 2)
    if k % 2 > 0: return y[n]
    return float(y[n-1] + y[n]) / 2

#10

bool inherits int, False has a value of 0 which could be mistaken for a valid result even when there isn't one that makes any sense.

There is a median function in Python 3's standard library:

https://hg.python.org/cpython/file/v3.5.1/Lib/statistics.py#l297

Notably:
It doesn't return a value on fail, it raises exceptions.
It leaves most of the input checking to the operations it performs.
It doesn't look all that closely at the data.

>>> from statistics import median
>>> median('hello')
'l'

I think that's perfectly fine, the given value supports the operations required to compute a median for it, so why not.


#11

Good point. I see now how is_nums() could mess up on an empty list, and return a false positive.

Interesting find. It never occured to me that we could find a median of a sorted string. Does this essentially mean that any iterable will have a median, despite not being statistically relevant?

statistics mdoule not available in 2.x


#12

Or at least that the function shouldn't be the one to decide - just try and see what happens. If the required operations are supported then perhaps there is meaning to the result.


#13

Good point. The function is essentially left pure. It could feasibly be sorting ordinals instead of values, and returning the middle.