Seetharaman Venkatasamy
Ranch Hand
Posts: 5575
Hi experts,

int result = low+high / 2;

here low+high might be overflow; if low or high is almost integer range. fine so there is some technique to avoid the overflow

1. int result = low+((high-low)/2)

2. int result = (low+high) >>> 1

and I am confused with (2) . if you right shift by unsigned 1 then yes you get value of divided by 2. but before that the overflow (a+b) happen right again?

also i understood i am lacking something here. because the (2) is suggested by Joshua Bloch in Arrays.binarySearch in JDK.

Matthew Brown
Bartender
Posts: 4568
9
I think it comes down to the difference between >> and >>>. With the latter you get an overflow to a negative number, but you still end up with the right answer when you shift it because it adds a zero at the front.

Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15480
43
Note: You probably meant

int result = (low+high) / 2;

which is not the same as

int result = low+high / 2;

because that means

int result = low + (high / 2);

(the / operator has higher precedence than the + operator).

Campbell Ritchie
Sheriff
Posts: 50171
79
Seetharaman Venkatasamy wrote: . . . some technique to avoid the overflow . . .
It might be better to cast all those values to longs, or use BigInteger, to avoid the risk of overflow.

Matthew Brown
Bartender
Posts: 4568
9
Campbell Ritchie wrote:It might be better to cast all those values to longs, or use BigInteger, to avoid the risk of overflow.

While that's true in general, I think the context is implementing an efficient binary search algorithm, in which case the conversion overhead would be something you'd want to avoid. It's probably once of those few cases where performance is really important. The Arrays.binarySearch() method does it that way.

Seetharaman Venkatasamy
Ranch Hand
Posts: 5575
Jesper de Jong wrote:Note: You probably meant

int result = (low+high) / 2;

Yes

Seetharaman Venkatasamy
Ranch Hand
Posts: 5575
Matthew Brown wrote:
Campbell Ritchie wrote:It might be better to cast all those values to longs, or use BigInteger, to avoid the risk of overflow.

in which case the conversion overhead would be something you'd want to avoid. It's probably once of those few cases where performance is really important. The Arrays.binarySearch() method does it that way.

make the parameter as long, long .. no?

Stephan van Hulst
Bartender
Posts: 6311
77
No, because you'd be stuck with the same problem. What if someone passes two long values that after addition can overflow? Besides, you would have to cast back to int at some point again. And on the topic of micro-optimizations, on a 32bit system, longs would need two clock cycles to be pushed and pulled through the bus.

Campbell Ritchie
Sheriff
Posts: 50171
79
Yes, I see what you mean about binary search. But are you going to have arrays bigger than 2^30 to search in the first place? If binary search is logarithmic complexity, surely the equality testing will be slower than the (never > 31) arithmetical operations on the indices?

Seetharaman Venkatasamy
Ranch Hand
Posts: 5575
Campbell Ritchie wrote: But are you going to have arrays bigger than 2^30 to search in the first place?

I dont have an experience. but lets pretend

Stephan van Hulst
Bartender
Posts: 6311
77
Pretend you have a 4GB array in memory? :P

I think the performance problems you get from memory swapping might be bigger than the gain you get from doing a logarithmic amount of shifts.

Winston Gutkowski
Bartender
Posts: 10527
64
Seetharaman Venkatasamy wrote:I dont have an experience. but lets pretend

Then the alternative is your original #1; however I'm quite sure that Matthew's right, and it was done for efficiency.
Both addition and bit shifts are extremely fast compared to division - although it's highly likely that a modern compiler will be able to recognise '/ 2' and change it to '>> 1'; in which case the difference between #1 and #2 won't be as great.

Basically, it's an old bitwise trick - kind of like the XOR swap - which works specifically for this circumstance: two 31-bit signed integers that need to be averaged. If you don't follow it, I'd advise you to use the other alternative, which can similarly be written:
int result = low + ((high-low) >> 1);
which involves one extra subtraction (also fast).

Winston

Seetharaman Venkatasamy
Ranch Hand
Posts: 5575
Matthew Brown wrote:I think it comes down to the difference between >> and >>>. With the latter you get an overflow to a negative number, but you still end up with the right answer when you shift it because it adds a zero at the front.

thank you Matthew, got it .

Campbell Ritchie
Sheriff
Posts: 50171
79
Stephan van Hulst wrote:Pretend you have a 4GB array in memory? :P . . .
The only thing you can get > 2147483647 elements into is a linked list. And you ain’t going to do a binary search on a linked list.
Or a tree, which is its own binary search structure without having to use a binary search.

BTW: the smilie is :‑P not :P See

Stephan van Hulst
Bartender
Posts: 6311
77
I recall pointers are 4 bytes wide ;)

And I grew up with my friendly textual smileys, I'll have none of this newfangled graphical stuff :P

Campbell Ritchie
Sheriff
Posts: 50171
79
Are pointers 4 bytes wide? Those used for array indices in Java are 31 bits wide because they exclude negative numbers. And you cannot declare an array as having 2147483648 members, because that is outwith the limits for an int.
Are the widths of pointers defined at all in C?

Winston Gutkowski
Bartender
Posts: 10527
64
Campbell Ritchie wrote:Are the widths of pointers defined at all in C?

Actually, I'm pretty certain that they are 32-bit (or possibly processor width) by default; although I'm pretty sure that there are header definitions for WIDE_PTR (or something siimilar). I'm not exactly sure how 64-bit pointers are implemented though (never had need any need to use 'em). Hopefully, Stephan'll tell us.

Winston

Stephan van Hulst
Bartender
Posts: 6311
77
Yeah, simple pointers are 4 bytes wide (possibly 8 bytes in x64 systems, but I'm not sure), regardless of which addresses are valid. So to get an 4GB array you would need a billion elements, which is within the int range.

As far as I know, extended pointers work by the processor sending two words over the bus, and telling memory to interpret it as a single pointer using the control line of the bus.

I'll look it up though.

Campbell Ritchie
Sheriff
Posts: 50171
79
But your 4GB array spread over 32‑bit words, would contain 1G elements.

Stephan van Hulst
Bartender
Posts: 6311
77
Yep, or roughly the 2^30 array you referred to earlier this thread ;)

Campbell Ritchie
Sheriff
Posts: 50171
79
We’ve finally stopped going round in circles and got agreement

Stevens Miller
Bartender
Posts: 1302
24
Doesn't have to be an array. A function scaled over integer space could be subjected to a Newton's-method-style search for its zero-crossings (or other values).