Help find flaws with code for Median using min max


#1



My code got through the test but I am unsure if it I missed any cases that might break the code. I used min() and max() methods to get the median. Please critique my code! Thank you.


def median(lst):
    result = []
    #Create new list for edits
    for i in lst:
        result.append(i)
    #For lists with odd index
    if len(result) % 2 != 0:    
        while max(result) != min(result):
            result.remove(max(result))
            if max(result) != min(result):
                result.remove(min(result))
        return result[0]
    #For lists with even index
    else:
        while len(result) > 2:
            result.remove(max(result))
            if max(result) != min(result):
                result.remove(min(result))            
        return (result[0] + result[1]) / 2.0
            
    

print(median([4,5,6,7,]))


#2

For correctness you'd need to consider all possible lengths of that list, if you convince yourself that it's correct for lengths 1, 2, 3 and that adding any length to that will just add iterations then you've done a fairly good assessment of its correctness.

I would point out that it's slow for large lists. The best known amount of time required to find the median is linear, that is, for each extra length, the amount of time grows by the same amount. There's an algorithm called quick-select that can find the nth smallest value in linear time, that can be used to select the middle or middle two values. Yours is instead quadratic, because for each added length, you iterate through the whole list (actually half on average, but that's a negligible difference. Removing also takes linear time each time you do it). For example, for a list of size 1000, an optimal algorithm would take on the order of 1000 operations, while yours would take on the order of 1000000 operations.

If you're only concerned with correctness then that's fine, sometimes that's ok, like when you know that only small data input will be used.

Doing it in linear time is far beyond the level of this course. There may be libraries that can find the nth smallest value (numpy), that could be used. Something in-between though is sorting, comparison sort takes N log N time (for length 1000 that would would be on the order of 10000 operations)

As for the code itself (disregarding what it's doing), the code for the loops is duplicated, the same loop can be used for both cases, and then look at what's remaining once there is one or two values left.


#3

That's a great reply, thank you ionatan.

I'll try to seek out what the "most optimal" answer is, without libraries.


#4

Oh don't go for it. It's far beyond what's expected at this level. Sorting it and picking out the middle element(s) is about as good as can be expected.

You can have a look at quick-select on wikipedia if you want.


#5

Thanks for the tip!

The other answers (the ones that I guess Codecademy was giving hints for) all seem to do everything in one if/else statement. That's much better than what I have. How would we be able to approximate the speed? Any resources I should look to to understand computing speed and good code?


#6

Here's a start: https://en.wikipedia.org/wiki/Computational_complexity_theory
But it's all just thinking about how long things take relative to input size and how many times they are repteated.
For example, min and max for a list requires looping through the whole list. (N iterations) if you do this once for each value in the list, then N iterations are repeated N times, for a total of N * N (quadratic).
Removing at the front of a list means moving all following elements. Removing at the end is constant time. Removing a fixed number of steps from the end is also constant time. Removing N steps from the end takes N time.

The existence of different data structures is often to provide different trade-offs of functionality and speed of operations.

Finding out if a key is present in a dict or set takes constant time because they are hashmaps. Sometimes trees are used instead, those have log N time lookup but instead have the property that the values are sorted at all times while a hashmap is unordered. min and max for a binary search tree takes log N time, that's better than for both list and dict/set

As long as you're not doing something in real-time (low latency) you'll probably only be concerned with how much additional time is required as input grows. If it's all 100 times slower then that's still negligible for many purposes, it's a constant factor. If there's a large initialisation then you'd still be okay for throughput but latency might suffer greatly and perhaps the initialisation could be done as-needed instead and cache the results until next time it's used.


#7

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.