What are the optimization issues here?


#1

As such the bulk of my work ends up as procedures. Witness the following...

def is_prime(x):
    if x < 2: return False
    if x > 2:
        for i in range(2, int(x ** 0.5) + 1, 2):
            if x % i == 0: return False
    return True
def primes_to_n(n):
    if n < 2: return
    primes = [2]
    primes += [x for x in range(3, n+1, 2) if is_prime(x)]
    return primes
def prime_factors(n):
    factors = primes_to_n(int(n / 2) + 1)
    freq = {}
    for i in range(len(factors)):
        f = factors[i]
        freq[f] = 0
        while True:
            if n % f == 0:
                n /= f
                freq[f] += 1
            else: break    
    return { k: freq[k] for k in freq if freq[k] > 0}
def prime_factor_sum(n):
    freq = prime_factors(n)
    return sum([k * freq[k] for k in freq])

print (1089, prime_factors(1089))
print (prime_factor_sum(1089))
print (28, prime_factors(28))
print (prime_factor_sum(28))

I suspect very weak on optimization, my greatest weakness. Would it were this (optimization) would sink in. Any takers?


#2

Hi,

You should use this algorithm to find primes up to n:


#3

Thank you for pointing this out. The thought had crossed my mind, but I wanted to see what I could come up, first. Still getting my head to wrap around this (for the n-th time). I read this page over and dug around some to learn more about the S. of E. after which I searched for a sample program and found this one,

# https://iamrafiul.wordpress.com/2013/04/28/sieve-of-eratosthenes-in-python/
def sieve(n):
    not_prime = []
    prime = []
    for i in range(2, n+1):
        if i not in not_prime:
            prime.append(i)
            for j in range(i*i, n+1, i):
                not_prime.append(j)
    return prime

I tweaked this function to print out the list,

def prime_factor_sum(n):
    freq = prime_factors(n)
    print (n, freq)
    return sum([k * freq[k] for k in freq])

Substituting this line, in prime_factors(),

factors = primes_to_n(int(n / 2) + 1)

with this,

factors = sieve(int(x / 2) + 1)

and then timing it on a relatively small number, 10101...

from time import process_time    # Python 3.3.x and higher

t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t

10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.37440239999999997
>>>

Now going back to this line,

factors = primes_to_n(int(n / 2) + 1)

t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t

10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.015600100000000006
>>>

Then I went with an arbitrarily larger number, 999919 using the same original code,

t = process_time()
print (prime_factor_sum(999919))
elapsed_time = process_time() - t

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.283653099999999
>>>

Swapping back in the sieve() line, it took so long I aborted it. By calculation, we can see it would have taken over three minutes to complete. I've obviously stumbled upon a very inefficient bit of code and will have to keep looking for something that betters my own. I'm sure there must a more efficient sieve than this sampled one.


#4

I have made my own sieve, looks a bit messy but it's faster:

def sieve(n):
    # returns a list of primes up to n excluding n
    nums = [True] * n
    i = 2 # we start with crossing of all multiples of two

    nums[0] = False # 0 is not prime
    nums[1] = False # 1 is not prime
    while(i**2 < n):
        if nums[i] == False:
            i += 1
            continue # if i isn't prime we already crossed of all multiples
        
        j = i**2
        while(j < n):
            nums[j] = False
            j = j + i
        i += 1

    return [i for i in range(n) if nums[i] == True]

I think what makes it faster is that it doesn't look up if i in the not_prime list, it only hast to look at nums[i]. I think that's quicker than having to look through the whole list.


#5

Wow! Now we're cooking with gas. Your sieve blows my procedure right out of the water.

Mine:

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.8452567
>>>

Yours:

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.2964019
>>>

27 times faster. Now if I could just get my brain to think this way. The boolean array is obviously the way to go. The sheer simplicity is mind-boggling. Thank you, very much!


#6

That's the same thing I thougt when I first saw it. Guess that's why I remembered it when I saw your question.

Just realized that

return [i for i in range(n) if nums[i] == True]

does the same as:

return [i for i in range(n) if nums[i]]

I believe the second one is slightly faster.


#7

I'll run another test...

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.28080179999999993

This is my usual habit... To not compare with a boolean but just test truthiness implicitly. Not sure which is the better practice.