# Challenge: circular counter without conditional branches

Dalton Filho
Greenhorn
Posts: 25
Can someone create a circular counter that would go back and forth from 0 to N-1 ad infinitum, without any conditional branches?

For N = 4, this counter would output: 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 3 2 1 0 1 2 ...

I sent this challenge to another forum and it took 35 days for someone to find the answer. Can Javaranchers beat them?

What is this challenge all about? Efficiency, of course! When I'm doing a fade-in/fade-out animation, I'm reading a color array back and forth, and because I want this animation to have the best performance possible, removing the conditional branches is a great help. Of course, the alternative computation should be cheaper than a failed conditional branch.

With the solution found on the other forum, the perfomance is 40.5% higher without conditional branches, which is a very significant number!

PS: I know we have JavaFX today, but at the time I needed it first, it didn't exist; nevertheless, this is a good programming challenge that remains until this day.

Ilja Preuss
author
Sheriff
Posts: 14112
�x mod 6 - 3�

translated to Java:

Math.abs(x%6-3)

Let x start at 3 and increment by 1.

Not sure how fast that actually is...

Dalton Filho
Greenhorn
Posts: 25
Math.abs() uses conditional branches.

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Avoiding all conditional branches sounds like a remarkably arbitrary (and silly) requirement. Isn't your true requirement performance-based? Why not express it that way? Of course, performance tends to be tied to a particular platform and hardware setup, which makes it difficult to objectively evaluate between people who do not share access to the platform and hardware.

Anyway, here's my idea:

Obviously, the first two for loops involve the dreaded conditional branch (shudder). But so what? Each loop is very, very finite. I.e. short. The third loop is the important one, since it's the one that is repeated ad infinitum. Is it faster than one of the other obvious solutions? E.g. one that uses the dreaded conditional branch? I don't know, and don't really care, since such information varies with platform and hardware anyway. This solution avoids conditional branching in the only loop that matters, the one that's repeated indefinitely. I have severe doubts that it's possible to avoid conditional branching entirely. And I don't see why anyone should care anyway. The relevant question here is whether an array lookup is faster than a conditional check. And that varies with platform and hardware anyway. Certainly there may be an answer that is optimal for a particular setup, but I strongly doubt that there's any answer that's valid across all setups. If my answer is not "satisfactory", I suggest you reformulate the problem in more general terms.

Henry Wong
author
Marshal
Posts: 21718
85
Originally posted by Dalton Filho:
Math.abs() uses conditional branches.

True, but it is possible to create an abs() method that doesn't use conditional branches. Try this...

BTW, I agree with Jim. Having a requirement of no conditional branches doesn't help with performance. In fact, the abs() method that I provided is probably slower than the Math.abs() method.

Henry

Henry Wong
author
Marshal
Posts: 21718
85
Also... you can actually build a condition branch, without using any condition branches. For example... here is the code to detect whether a value is zero:

With this, you can detect whether two values are equal:

With this, you can build the equivalent of the ternary operator:

And of course, with this, you can build the solution:

Henry

Dalton Filho
Greenhorn
Posts: 25
Hello everyone again,

Using Henry's solution, I've made up a test to compare the performance of the solution without conditional branches against the traditional approach, which uses one iteration to count up and another to count down. The complete code for the test is below:

Yes, the tests use conditional branches, but because the same branches were put in both tests, the overhead should be systematically distributed. In my machine, the optimised counter is 77% faster, which is quite a reasonable number to me. To dismiss the possibility that my machine might be "poisoning" the results, other people could test the code to see if the results are consistent in other machines.

To make things even more interesting, Henry's solution was faster than the other forum's solution, which is also based on bitshift operations to avoid the conditional branches.

I admit that my assumption could be wrong: this solution could be slower, but what is so wrong about trying to find a better solution, one that doesn't seem obvious at first? This is a good programming challenge in the worst case, and for this problem in particular it really paid off.

Henry Wong
author
Marshal
Posts: 21718
85

In my machine, the optimised counter is 77% faster, which is quite a reasonable number to me. To dismiss the possibility that my machine might be "poisoning" the results, other people could test the code to see if the results are consistent in other machines.

To make things even more interesting, Henry's solution was faster than the other forum's solution, which is also based on bitshift operations to avoid the conditional branches.

Unfortunately, I don't think your "results" are comparable. Take a look at your code again.

In the traditional counter, each iteration is a full cycle up and back down. In the "optimized" version, you are treating each iteration as a single increment or decrement.

A more correct traditional counter (but still not exactly correct) would be...

Henry
[ January 01, 2008: Message edited by: Henry Wong ]

Henry Wong
author
Marshal
Posts: 21718
85
I admit that my assumption could be wrong: this solution could be slower, but what is so wrong about trying to find a better solution, one that doesn't seem obvious at first? This is a good programming challenge in the worst case, and for this problem in particular it really paid off.

I have no problems with this challenge. I actually liked it.

However, I believe the issue of conditional branches as a bad thing has long been solved. In the olden days, a conditional branch took many cycles to test. And if the branch is taken, it takes even more cycles, as the fetch pipeline has to be dumped.

With modern processors, this is no longer true. Conditional branches are just like other instructions, in that it takes one cycle (for most processors). And the fetching logic is so improved that there is not much difference if a branch is taken anymore either. (heck, certain processors won't even dump the pipeline, relying on the compiler to fill it with instructions that will be ran regardless of whether the branch is taken or not).

Henry

Dalton Filho
Greenhorn
Posts: 25
Henry, thank you very much for the insight. I didn't finish college too long ago and yet it seems like my computer architecture teacher didn't provide me with the most up to date information. I was assuming that failed conditional branch was a catastrophe even in modern processors.

... and to update you with an actual result, I've multiplied the number of iterations on the "optimised" test by 2n. The traditional counter is almost 80% faster.

Dalton

Henry Wong
author
Marshal
Posts: 21718
85
I didn't finish college too long ago and yet it seems like my computer architecture teacher didn't provide me with the most up to date information. I was assuming that failed conditional branch was a catastrophe even in modern processors.

I believe Intel released (with either the 486 or Pentium) a processor with a conditional jump of one cycle (if not taken) and 3 or 4 cycles (if taken). Prior to that, it was almost a dozen cycles if taken... Plus they also had fetching logic that "predicted" the jump, allowing the jump to take one cycle (if taken and predicted correctly).

This was way back in the early to mid 90s. Before the Intel and AMD processor wars. And before all the improvements from IBM (P5/P6, cell, etc).

IMHO, with modern processors, the "catastrophe" is probably a cache miss. Modern processors, with their multilevel caches, that they have to keep in sync with other cores (within the same and with other processors) -- a cache miss can stall the processor core until the caches can bring in the data from main memory (worst case).

BTW, being a software guy, who likes hardware, I'm a bit out of my element -- so please correct me if I am wrong.

Henry