# What are the optimization issues here?

As such the bulk of my work ends up as procedures. Witness the following…

``````def is_prime(x):
if x < 2: return False
if x > 2:
for i in range(2, int(x ** 0.5) + 1, 2):
if x % i == 0: return False
return True
def primes_to_n(n):
if n < 2: return
primes = [2]
primes += [x for x in range(3, n+1, 2) if is_prime(x)]
return primes
def prime_factors(n):
factors = primes_to_n(int(n / 2) + 1)
freq = {}
for i in range(len(factors)):
f = factors[i]
freq[f] = 0
while True:
if n % f == 0:
n /= f
freq[f] += 1
else: break
return { k: freq[k] for k in freq if freq[k] > 0}
def prime_factor_sum(n):
freq = prime_factors(n)
return sum([k * freq[k] for k in freq])
``````
``````print (1089, prime_factors(1089))
print (prime_factor_sum(1089))
print (28, prime_factors(28))
print (prime_factor_sum(28))
``````

I suspect very weak on optimization, my greatest weakness. Would it were this (optimization) would sink in. Any takers?

Hi,

You should use this algorithm to find primes up to n:

1 Like

Thank you for pointing this out. The thought had crossed my mind, but I wanted to see what I could come up, first. Still getting my head to wrap around this (for the n-th time). I read this page over and dug around some to learn more about the S. of E. after which I searched for a sample program and found this one,

``````# https://iamrafiul.wordpress.com/2013/04/28/sieve-of-eratosthenes-in-python/
def sieve(n):
not_prime = []
prime = []
for i in range(2, n+1):
if i not in not_prime:
prime.append(i)
for j in range(i*i, n+1, i):
not_prime.append(j)
return prime
``````

I tweaked this function to print out the list,

``````def prime_factor_sum(n):
freq = prime_factors(n)
print (n, freq)
return sum([k * freq[k] for k in freq])
``````

Substituting this line, in `prime_factors()`,

``````factors = primes_to_n(int(n / 2) + 1)
``````

with this,

``````factors = sieve(int(x / 2) + 1)
``````

and then timing it on a relatively small number, 10101…

``````from time import process_time    # Python 3.3.x and higher

t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t
``````
``````10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.37440239999999997
>>>
``````

Now going back to this line,

``````factors = primes_to_n(int(n / 2) + 1)
``````
``````t = process_time()
print (prime_factor_sum(10101))
elapsed_time = process_time() - t
``````
``````10101 {37: 1, 3: 1, 13: 1, 7: 1}
60
>>> elapsed_time
0.015600100000000006
>>>
``````

Then I went with an arbitrarily larger number, 999919 using the same original code,

``````t = process_time()
print (prime_factor_sum(999919))
elapsed_time = process_time() - t
``````
``````999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.283653099999999
>>>
``````

Swapping back in the `sieve()` line, it took so long I aborted it. By calculation, we can see it would have taken over three minutes to complete. I’ve obviously stumbled upon a very inefficient bit of code and will have to keep looking for something that betters my own. I’m sure there must a more efficient sieve than this sampled one.

I have made my own sieve, looks a bit messy but it’s faster:

``````def sieve(n):
# returns a list of primes up to n excluding n
nums = [True] * n
i = 2 # we start with crossing of all multiples of two

nums[0] = False # 0 is not prime
nums[1] = False # 1 is not prime
while(i**2 < n):
if nums[i] == False:
i += 1
continue # if i isn't prime we already crossed of all multiples

j = i**2
while(j < n):
nums[j] = False
j = j + i
i += 1

return [i for i in range(n) if nums[i] == True]
``````

I think what makes it faster is that it doesn’t look up if i in the not_prime list, it only hast to look at nums[i]. I think that’s quicker than having to look through the whole list.

1 Like

Wow! Now we’re cooking with gas. Your sieve blows my procedure right out of the water.

Mine:

``````999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
8.8452567
>>>
``````

Yours:

``````999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.2964019
>>>
``````

27 times faster. Now if I could just get my brain to think this way. The boolean array is obviously the way to go. The sheer simplicity is mind-boggling. Thank you, very much!

1 Like

That’s the same thing I thougt when I first saw it. Guess that’s why I remembered it when I saw your question.

Just realized that

`return [i for i in range(n) if nums[i] == True]`

does the same as:

`return [i for i in range(n) if nums[i]]`

I believe the second one is slightly faster.

I’ll run another test…

``````999919 {1009: 1, 991: 1}
2000
>>> elapsed_time
0.28080179999999993
``````

This is my usual habit… To not compare with a boolean but just test truthiness implicitly. Not sure which is the better practice.