What are the optimization issues here?

As such the bulk of my work ends up as procedures. Witness the following…

def is_prime(x):
    if x < 2: return False
    if x > 2:
        for i in range(2, int(x ** 0.5) + 1, 2):
            if x % i == 0: return False
    return True
def primes_to_n(n):
    if n < 2: return
    primes = [2]
    primes += [x for x in range(3, n+1, 2) if is_prime(x)]
    return primes
def prime_factors(n):
    factors = primes_to_n(int(n / 2) + 1)
    freq = {}
    for i in range(len(factors)):
        f = factors[i]
        freq[f] = 0
        while True:
            if n % f == 0:
                n /= f
                freq[f] += 1
            else: break    
    return { k: freq[k] for k in freq if freq[k] > 0}
def prime_factor_sum(n):
    freq = prime_factors(n)
    return sum([k * freq[k] for k in freq])
print (1089, prime_factors(1089))
print (prime_factor_sum(1089))
print (28, prime_factors(28))
print (prime_factor_sum(28))

I suspect very weak on optimization, my greatest weakness. Would it were this (optimization) would sink in. Any takers?

Hi,

You should use this algorithm to find primes up to n:

1 Like

Thank you for pointing this out. The thought had crossed my mind, but I wanted to see what I could come up, first. Still getting my head to wrap around this (for the n-th time). I read this page over and dug around some to learn more about the S. of E. after which I searched for a sample program and found this one,

# https://iamrafiul.wordpress.com/2013/04/28/sieve-of-eratosthenes-in-python/
def sieve(n):
    not_prime = []
    prime = []
    for i in range(2, n+1):
        if i not in not_prime:
            prime.append(i)
            for j in range(i*i, n+1, i):
                not_prime.append(j)
    return prime

I tweaked this function to print out the list,

def prime_factor_sum(n):
    freq = prime_factors(n)
    print (n, freq)
    return sum([k * freq[k] for k in freq])

Substituting this line, in prime_factors(),

factors = primes_to_n(int(n / 2) + 1)

with this,

factors = sieve(int(x / 2) + 1)

and then timing it on a relatively small number, 10101…

from time import process_time    # Python 3.3.x and higher

t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t
10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.37440239999999997
>>> 

Now going back to this line,

factors = primes_to_n(int(n / 2) + 1)
t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t
10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.015600100000000006
>>> 

Then I went with an arbitrarily larger number, 999919 using the same original code,

t = process_time()
print (prime_factor_sum(999919))
elapsed_time = process_time() - t
999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.283653099999999
>>> 

Swapping back in the sieve() line, it took so long I aborted it. By calculation, we can see it would have taken over three minutes to complete. I’ve obviously stumbled upon a very inefficient bit of code and will have to keep looking for something that betters my own. I’m sure there must a more efficient sieve than this sampled one.

I have made my own sieve, looks a bit messy but it’s faster:

def sieve(n):
    # returns a list of primes up to n excluding n
    nums = [True] * n
    i = 2 # we start with crossing of all multiples of two

    nums[0] = False # 0 is not prime
    nums[1] = False # 1 is not prime
    while(i**2 < n):
        if nums[i] == False:
            i += 1
            continue # if i isn't prime we already crossed of all multiples
        
        j = i**2
        while(j < n):
            nums[j] = False
            j = j + i
        i += 1

    return [i for i in range(n) if nums[i] == True]

I think what makes it faster is that it doesn’t look up if i in the not_prime list, it only hast to look at nums[i]. I think that’s quicker than having to look through the whole list.

1 Like

Wow! Now we’re cooking with gas. Your sieve blows my procedure right out of the water.

Mine:

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.8452567
>>> 

Yours:

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.2964019
>>> 

27 times faster. Now if I could just get my brain to think this way. The boolean array is obviously the way to go. The sheer simplicity is mind-boggling. Thank you, very much!

1 Like

That’s the same thing I thougt when I first saw it. Guess that’s why I remembered it when I saw your question.

Just realized that

return [i for i in range(n) if nums[i] == True]

does the same as:

return [i for i in range(n) if nums[i]]

I believe the second one is slightly faster.

I’ll run another test…

999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.28080179999999993

This is my usual habit… To not compare with a boolean but just test truthiness implicitly. Not sure which is the better practice.