# Fast division check

manolis tsamis
Greenhorn
Posts: 4
Hello,
I want to know if someone can provide a faster algorithm for my problem.
I want to know whether a number n1 is divided by n2 without remainder.
I know I can simply write if (n1 % n2 == 0) ...
But it's not fast enough.
Currently I use:
double div = (double) n1 / (double) n2;
If (div == (int) div){
...
}

This is faster than the mod operator I just want to know if someone has a faster way from this.

Don't answer let the compiler do optimizations... Blah blah blah.
I just look forward to any micro optimizations or such.

fred rosenberger
lowercase baba
Bartender
Posts: 12228
36
How do you define "fast enough"?

without knowing the answer to the above, what is the point of this exercise? If someone gives you something faster, you might just say "thanks, but is there anything faster?"

and sometimes the solution isn't a faster algorithm, but faster hardware.

further, micro-optimizations are seldom if ever worth it. so you save yourself a microsecond doing a division, then six months later spend an extra two days debugging the code.

fred rosenberger
lowercase baba
Bartender
Posts: 12228
36
also...i'm not sure what you are doing IS faster...I have no concrete evidence, but it looks AWFUL complicated...

Henry Wong
author
Marshal
Posts: 21719
85
manolis tsamis wrote:
I know I can simply write if (n1 % n2 == 0) ...
But it's not fast enough.
Currently I use:
double div = (double) n1 / (double) n2;
If (div == (int) div){
...
}

This is faster than the mod operator I just want to know if someone has a faster way from this.

Seriously?!?!?

Converting two integer numbers to floating point, then dividing them to get a floating point result, then converting one back to an integer, and then finally doing a floating point to integer comparison ... is faster than doing a integer modulus and comparing to zero? how is that possible?

On most processors, a integer modulus instruction is the same logic as the integer divide instruction. It should be the same speed as the integer divide. I'm confused ?!?!?

Henry

Ulf Dittmer
Rancher
Posts: 42968
73
How do you define "fast enough"?

+1 on that. Without knowing that it makes no sense to optimize.

Don't answer let the compiler do optimizations...I just look forward to any micro optimizations or such.

The problem with micro-optimization is that they may well differ between different compilers and/or JVMs, or even between different versions of the same compiler and/or JVM. So what works today may not work after the next update, making its value questionable.

Henry Wong
author
Marshal
Posts: 21719
85

Just ran a series of tests with 10 million randomly generated positive integers. Of course, the tests were configured to minimize any affect of garbage collection and the JIT compiler. And ...

First of all, the times were really small. Do you really want to optimize such a ridiculous fast operation? Aren't there more lower hanging fruits to pick?

Anyway, the modulus test did it in 1/3 the time, which is kinda expected, because the floating point test did around three times the work !!

Henry

Winston Gutkowski
Bartender
Posts: 10527
64
• 1
manolis tsamis wrote:Don't answer let the compiler do optimizations... Blah blah blah.

Why shouldn't we? Or are you only interested in the answers you want to hear?

We're all volunteers here, and many of the contributors have LOTS of experience; so if they tell you that you might be thinking about this the wrong way, my advice would be to listen.

Winston

Henry Wong
author
Marshal
Posts: 21719
85
• 1
Winston Gutkowski wrote:
manolis tsamis wrote:Don't answer let the compiler do optimizations... Blah blah blah.

Why shouldn't we? Or are you only interested in the answers you want to hear?

I am actually interested in this request -- in a different way. The OP obviously thinks that this can be affected by the compiler -- even perhaps to make the question moot. Does this imply that some testing was done?

If so, I would like to know what kinda of testing was done, to draw the conclusion in the first place... because, we think the opposite. And my quick and dirty tests shows the opposite.

Henry

Henry Wong
author
Marshal
Posts: 21719
85
• 1
Ulf Dittmer wrote:
How do you define "fast enough"?

+1 on that. Without knowing that it makes no sense to optimize.

I'll counter this a bit, with a story... Once, I had the pleasure of having a discussion with one of the top people in HFT. I actually have had lots of discussions, but this discussion gave me a small epiphany...

The conversation went like "we need to get latency down", "ok, what are you seeing", "we need to have it lower", "I mean, what is your latency budget?", "the budget? oh, that's zero"...

Partially, it was a joke, but it was also very serious. In the world of HFT, there is no reward for second place. In fact, second place means that, at best, you don't make money, and at worst, you lose money. If/when you realize that you are in second place, by the time you fix it, your company can be out of business. So, as a counter argument, sometimes it does make sense to optimize (and to optimize everything).

Henry

PS... Only because someone will mention this. No, the discussion was *not* in regards to Java. This conversation was in regards to C/C++.

Ulf Dittmer
Rancher
Posts: 42968
73
• 1
Interesting point. I once heard a speaker talk about optimizing for trading - he had quite a few fascinating tidbits. His budget was very, very far away from zero, though - which allowed for some approaches that you would not normally consider.

But even in that case the question "How do you define fast enough?" has an answer (faster than anybody else), and without knowing it it doesn't make much sense to start IMO, because the approaches could be rather different depending on where the goal line is.

Tim Cooke
Sheriff
Posts: 3166
138
• 1
A similar story from the other side of the HFT fence. The company I work for provide the platform on which those HFT's trade and the "how fast is fast enough?" answer is a moving target. In it's simplest terms, faster than the opposition pretty much sums it up. But everyone in this space is constantly pushing for lower latency so in order to stay ahead in business we need to stay ahead technically. There is no final "fast enough" for us, just "fast enough for today" but so long as we can go faster then we'll continue to succeed as a company.

We do put a lot of effort into software optimizations to reduce latency and increase throughput, and also look out to the top technologists in this space to help, but that's not everything. We also invest in hardware and infrastructure improvements too, such as custom NIC's with custom software on it (this is mostly to help maintain message ordering but a speed up is a welcome addition) and we also provide server hosting for our HFT's to have their servers physically next to our servers with a direct LAN connection. My point here is that software optimisations are not the whole picture a lot of the time.

I too had some great discussions with one of our external consultants about creating lock free algorithms for Queue implementations and he applied knowledge of the physical hardware architecture to write software that minimized contention on the hardware. It was fascinating stuff, and as an exercise we were able to create a Java Queue implementation that was many orders of magnitude faster than the fastest provided by the JDK.

One of the most fascinating 'tricks' he employed was related to the mod operation which was required to generate the next array index for accessing an array. As has been previously mentioned in this thread the mod operation is computationally equivalent of the division which on modern Intel hardware is 96 cpu cycles (I think, or something close). With some careful array size selection and a bit shift operation he was able to reduce that down to a single cpu cycle operation. Inspired stuff.

Getting back to the OP's original question though. The other guys here are right, without knowing what the target is it is hard to find motivation to do it. Why faster? How fast is fast enough?

Another point to make is that you should not be too quick to dismiss compiler optimisation. Writing code that you know will be optimized by the JVM at runtime is as legitimate a technique as any. To actively ignore it and not welcome any suggestion relating to it reduces the credibility of your question quite significantly. You should explain why you are not interested in JVM optimisations.

Winston Gutkowski
Bartender
Posts: 10527
64
Ulf Dittmer wrote:But even in that case the question "How do you define fast enough?" has an answer (faster than anybody else), and without knowing it it doesn't make much sense to start IMO, because the approaches could be rather different depending on where the goal line is.

Which surely suggests that it doesn't pay to publish?

Sounds to me like a spec for military hardware, which is to simply be "better" than the opposition, which is maybe why their projects have a tendency to overrun by miles - in many cases disastrously - and result in things like the "Mig-15 scare".

@Henry: I'm sure you know the old adage: "Good, Fast, Cheap - pick any two". I'd suggest that your story pretty much precludes the idea of a "good" HFT solution.

Winston

Ulf Dittmer
Rancher
Posts: 42968
73
Tom Cooke wrote:My point here is that software optimisations are not the whole picture a lot of the time.

Very true. One of the techniques mentioned in the talk I heard was to avoid Java GC by running the trading app on load-balanced machines that were running JVMs with just about all available RAM allocated (16 GB, if memory serves). They would then monitor the JVM memory, and shut down and restart the JVM before the "expensive" GC ever had a chance to kick in.

Winston Gutkowski wrote:Which surely suggests that it doesn't pay to publish?

I don't understand what you're trying to say (or imply) here.

Jayesh A Lalwani
Rancher
Posts: 2762
32
Whoa! Wait! what? JVM takes a hell of lot more time to start compared to GC. Restarting JVM doesn't make sense. That sounds like a justification to sweep memory leaks under the rug.

The whole principle behind JVM holding onto free memory is that allocating memory from the OS is a costly operation. Restarting JVM is tantamount to doing a huge dealloc which forces you to do more mallocs. Forget the JVM loading, and class loading time. Thereotically, restarting the JVM should cause a bigger "pause" than GC

Winston Gutkowski
Bartender
Posts: 10527
64
Ulf Dittmer wrote:I don't understand what you're trying to say (or imply) here.

I'm saying that if the benchmark is to be "better" or "faster" than the competition, then why tell them how "fast" or "good" you are? I don't know how the US military works, but it was a pretty open secret that before the Wall came down, published performance figures for many RAF aircraft were well below their known capability.

Winston

Ulf Dittmer
Rancher
Posts: 42968
73
Jayesh A Lalwani wrote:Restarting JVM doesn't make sense. That sounds like a justification to sweep memory leaks under the rug.... Theoretically, restarting the JVM should cause a bigger "pause" than GC

It does make sense, and no, it's not about memory leaks. Of course restarting the JVM takes longer than a GC, but during that time the load balancer takes care not to send request to that machine, so there is no downtime. Whereas during an expensive GC, a request might get slowed down, or not get serviced at all - that's what this aims to avoid.

Winston Gutkowski wrote:then why tell them how "fast" or "good" you are?

In the context of trading, the latencies are communicated -possibly via an SLA- because that's an important consideration. So everybody knows what everybody else can provide (at a minimum).

Henry Wong
author
Marshal
Posts: 21719
85
Ulf Dittmer wrote:
Jayesh A Lalwani wrote:Restarting JVM doesn't make sense. That sounds like a justification to sweep memory leaks under the rug.... Theoretically, restarting the JVM should cause a bigger "pause" than GC

It does make sense, and no, it's not about memory leaks. Of course restarting the JVM takes longer than a GC, but during that time the load balancer takes care not to send request to that machine, so there is no downtime. Whereas during an expensive GC, a request might get slowed down, or not get serviced at all - that's what this aims to avoid.

It is probably a bit better now, but around 3 or 4 years ago, the Oracle hotspot JVM can't really get past 8 or 9 gigabytes of heap. Around this point, the GC times starts to be measured in seconds, and then quickly starts to be measured in minutes. At this point, stuff like network sockets tends to be reset during the network outage. If you push the environment even further, say to more than around 12 gigabytes of heap, the GC can actually fail to work. This means that the GC times go from double digits minutes to forever.

As for solutions...

1. I have seen cases where the GC has been optimized like crazy, to try to get as much GC done by the new generation collector versus the old generation collector.

2. I have seen cases where the amount of memory for the heap is ridiculous. The goal is to provide as much memory as possible, so that a GC never needs to happen -- until the market closes, ie. when trading stops. In my opinion, this is a bad idea, because if you guess wrong, and a GC does happen, it can occur right before the market closes, which is the worst possible time... so, if you do this, you really need a ridiculous amount of memory.

3. And of course, the recommended option was to use the Azul JVM, with its pauseless GC. Back then (full disclosure, I worked for Azul back then) the JVM ran in an appliance, which meant lots of \$\$\$, so you have to be really really needing it to have it. These days, they do have a software version, but I have never used it.

Henry

Winston Gutkowski
Bartender
Posts: 10527
64
Henry Wong wrote:1. I have seen cases where the GC has been optimized like crazy, to try to get as much GC done by the new generation collector versus the old generation collector.

I'm certainly no expert on this (as you may have gathered ), but these techniques seem awfully "targeted" to me.

It seems to me that some form of highly compressed data format, or a simple (???) caching proxy might solve just as many problems as trying to target the processing module.

Or am I out to lunch here?

Winston

Chris Hurst
Ranch Hand
Posts: 443
3
Surprised no one has mentioned off heap storage as that usually comes up about now also ;-)

With regard to compression you would have to look at the CPU load in compression / decompression as required and if you could do so with zero garbage creation .
Caching can go wrong also if your not careful ;-) I've met some anti patterns.

Tim Cooke
Sheriff
Posts: 3166
138
Off heap storage would help if you'd decided to implement some sort of cache for remembering divisions you'd already done. In which case it would make sense to reduce the fraction to its lowest term by calculating the GCD of the integer pair, which in itself could be a candidate for caching. As you allude to, rolling your own cache is fraught with danger so it would be best to use an off the shelf solution for that, like ehCache's BigMemory product for off heap caching. Or you could use the in-memory cache solution but just configure it to cache at most a predefined number of items.

With my Moderator hat on I also feel obliged to mention that this thread is essentially a duplicate of the OP's other thread here.

Henry Wong
author
Marshal
Posts: 21719
85
• 1
Tim Cooke wrote:
With my Moderator hat on I also feel obliged to mention that this thread is essentially a duplicate of the OP's other thread here.

Well, I wouldn't exactly call it a duplicate, as they seemed to have became separate discussions (not to mention that they are a year apart). On the other hand, we really would appreciate it, if the OP returns to give more clarification -- because quite frankly, from our reading of the OP statements (and some testing), he is not correct. And from the other topic, he had this misconception for more than a year now. Perhaps, the OP can give more clarification on what is meant by his conclusions, or give more explanations on how he arrived at it?

Henry

Guillermo Ishi
Ranch Hand
Posts: 789
A native call to something written in assembly but don't know if the overhead would kill you. If it was C, just write inline assembly. If the divisor is a constant or there are limits on it there might be some sneaky way. Shifting is very fast, at least in C, as are logical operators. There might be some non-computational way to get the same result or an equivalent result, e.g. lookup tables. Or you could change your stategy and requirements and not need to know if there's a remainder at all. Or you could use pre-processing when time isn't an issue to take care of the problem.

Also - the speed of this time critical thing will vary from machine to machine. Best to eliminate the dependence on speed somehow I think.