I'm trying to learn to be more efficient in my code and practising on simple stuff for now.

I've written this tiny program to calculate the sum of all primes up to 2000000. Initially it took over 5 minutes to get the answer, but I managed to improve it to run in 2.6. Still I'm not satisfied and would love to improve further. Maybe you could give me some pointers

If a cluttered desk is a sign of a cluttered mind, then what are we to think of an empty desk?

[Albert Einstein] - or just another way to justify my mess

Original:

"Improved"

I moved the check for 1,3,5,7 outside of the loop as this would be executed anyway 2,000,000 times.

Given that every even number besides 2 is not a prime, this saves at least 1,000,000 method execution

calls, stack manipulation etc.

I also converted boolean isPrime() to int isPrime() returning 0 if not and the input if is to add to the sum.

WP

- 1

You can also not check every number, but only every other number - instead of "number++" do a "number +=2"

Even better, ever prime number is equal to (6n + 1) or (6n - 1), once you get past 3, so that requires even less checking.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

- 1

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

A few questions:

This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?

How does returning int, rather than boolean save me time?

Btw, William P O'Sullivan your original resulting time is slightly greater than mine 172794. I take it, it also depends on the CPU?

If a cluttered desk is a sign of a cluttered mind, then what are we to think of an empty desk?

[Albert Einstein] - or just another way to justify my mess

As for the returning boolean, vs. int then adding to the sum, I figured that if already know

the number should be added (via true result) then execute a redundant "if", why bother?

simply add the result or 0 (non prime). Removing or reordering

conditional clauses is a good optimization technique especially in SQL!

Yes, I presume since my company balks at upgrading my laptop, I took yours first as

a yardstick then went from there.

WP

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)

Martin Vajsar wrote:Is this Project Euler's Problem 10? After you submit the correct result, you can download an accompanying PDF document. It describes a very efficient algorithm for generating primes, called Sieve of Eratosthenes. You'll be able to cut the time down substantially, probably to a few seconds.

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)

Yes it is in fact a problem from Euler. I know there is a document explaining the best solution but I just wanted to see how my existing code could be improved because even though I have a couple of years experience in coding in Java I am honestly clueless about performance and what makes the programs slow down and what speeds them up

The sqrt already made it run in a blink of an eye and I'm quite happy with that, but I will definitely take a look at the current algorithms used for primes

Thanks

If a cluttered desk is a sign of a cluttered mind, then what are we to think of an empty desk?

[Albert Einstein] - or just another way to justify my mess

Rose Ellis wrote:This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?

divisors come in pairs....so if you find that x % 2 is 0, you know that x % (x/2) will also be zero. If you find that x % 3 = 0, then x % (x/3) is zero, etc.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

The execution time dropped from 2.6mins to 2.4seconds. I'd say this is a considerable improvement. Then I went on and changed the return type from

**boolean**to

**int**as sugessted by William. However, the execution time went back up to 55seconds. Why is that? I thought this change was supposed to speed it up further. My latest code is this:

[Albert Einstein] - or just another way to justify my mess

Martin Vajsar wrote:The "boolean" version printed out the sum on the screen only if the number was prime. The "int" version prints out the sum for each number, regardless of whether it is prime or not. Printing out text/numbers is very expensive operation, relative to the rest of your code at least. I'd suggest to remove it from the inner loop altogether and only output the final result, if you want to meaningfully compare the performance of different approaches.

I see. Didn't think about that. Never would have thought that printing takes so much time. I thought that adding 0 was slowing it down. As soon as I removed the line, the result was computed in 0.89seconds. Amazing

So quick question. In my project we're are using org.slf4j.Logger to output debugging statements. If printing is so expensive, does that mean that our app performance is reduced? Or is Logger somewhat cheaper then System.out?

[Albert Einstein] - or just another way to justify my mess

But remember, you've printed out two million lines in your code. I've got a system which logs quite heavily, and that one would take a day or more to generate two million log calls. An overhead of a minute or two is probably not noticeable at all.

1) speed is (almost) never the correct thing to go for. It especially is not what to go for first. Write clean, clear code that you can understand if you put it aside for a week or a month. Then run it. Then if it is too slow, look for ways to improve it.

2) EVERYTHING impacts performance. This includes other processes, so if you have a browser running flash during one run and not the other, you may notice a difference.

3) re-read rule #1

This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something

[Albert Einstein] - or just another way to justify my mess

- 1

Rose Ellis wrote:This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something

In the hope that you haven't abandoned this, I'll say one other thing:

You've probably already worked out that using previous results is definitely the way to go when using a divisor algorithm. The main problem with that is that it needs a list of "previous primes" to work with. A List has the advantage of being flexible, but if you're

*really into speed*, an array is (a) quicker and (b) smaller.

One thing that might help is the fact that, given some limit L, a reasonable approximation of the number of primes < L is L/ln(L) (ln = the natural log of L). So, if you want to find all the primes less than x, you need an array that is

*approximately*sqrt(x) / ln(sqrt(x)).

The trouble is, the approximation is almost always an

*under*estimate - in fact it's

*always*an underestimate once you get past L=60 or so - which means that it's

__less than__the size of array you need. Fortunately, once you get to L = 100 it's within 20%; by L=1000000, it's within 10% (but don't forget, it's always an

*under*estimate). There are even better estimation formulas around, including a great one for values of L > 56, but the

*proper*one may be more trouble than it's worth (it's quite involved).

All can be found at Wikipedia's Prime Number Theorem page if you're interested.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

If a number is not prime, then it can be composed as a product of two lesser numbers.

n = a*b;

The smaller of "a" and "b" must be at most the square root of "n", otherwise the product would exceed "n".

So the smallest (non-trivial) divisor of the number can be at most sqrt(n).

If n is the square of a prime, its smallest (non-trivial) divisor equals to sqrt(n).