Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Method Inlining  RSS feed

 
Mark Williams
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am curious about the ability of the Java HotSpot compiler being able to inline method calls. From what I can understand, Java 5 can automatically inline method invokations that make up a large percentage of the program's execution time. If this is correct, are there any caveats to the process?

I read a very old article which stated that only certain methods could be inlined. (Static, private, Final, etc). Is this true with more recent Java versions? (Java 5 to be specific) I have an object which calls a static method belonging to another object in a completely different package. I am wondering if I would see any performance gain by moving the static method into the object that is making the call.

Thanks.
 
Campbell Ritchie
Sheriff
Posts: 53767
127
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The reason you can inline private static and final methods is that they are certain to remain unchanged in all objects. If however you are using inheritance, a public instance method will be polymorphic; its actual content might differ from call to call because you might be calling subclass instances and there is no way the compiler can predict that.
Doing your own inlining is hard work and may cause maintenance problems; if you inline the same method in several places, you would have to go to all those places if you ever alter that method. Also your .java files will become bloated. So stick to method calls.

If you are looking at old articles (it might be worthwhile quoting the source) it is very likely that things have changed since then.

By the way: HotSpot is the JVM, not the compiler.
 
Mark Williams
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This particular method that I am interested in inlining gets invoked tens of thousands to possibly hundreds of thousands of times every time my application executes a SQL statement during an online transaction. Now, I know that reducing the number of calls would be the best approach and we have somebody looking into this possibility but there are still going to be a few statements that we absolutely must live with the current number of calls.

You're point is well taken about the maintenance headaches that inlining could possibly introduce but at this time, there is only one place in the app that we would want to inline this call. That would minimize the amount of duplicate code to just two instances of the function.

The first place I read about inlining is this very old article : http://www.javaworld.com/javaworld/jw-04-1997/jw-04-optimize.html?page=3

Also, on a less important note, I realize that Hotspot is the JVM but it apparently does have a compiler component:
Java Hotspot Compilers

I am guessing this compiler some feature that is used by the JVM to automatically/dynamically compile and optimize heavily used byte code into native machine code. Please correct me if I am not understanding correctly. I guess my question is more about how can I make sure my Java source is written in a way that we will see maximum optimization through this process.
 
Campbell Ritchie
Sheriff
Posts: 53767
127
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, the HotSpot JVM does incorporate a compiler, the JIT (=just in time) compiler. What that does is look for repeatedly-invoked code and recompile the bytecode to machine code to enable faster execution. But most people use the term "compiler" to mean the javac tool, so sorry I got confused on that point.
You quoted a good article, but it is unfortunately >12 years old, so it is badly out of date. Things have changed since then, particularly since JVM now usually incorporate a JIT element. And javac might optimise things like i * 2 into i << 1 anyway; you can tell by looking at the bytecode.

Even two instances of inlined code might be a maintenance problem. You could consider putting the SQL into a text file, and reading it into the methods when the application is initialised; instead of inlining you have a text file which would only require maintenance in one place.

If you haven't profiled the application, you don't know where it is running slowly, and may in fact optimise the wrong parts.

I would suggest you look at this article by Brian Goetz which gives a different perspective on optimisation and recommends "dumb code".
 
Mark Williams
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the reply Campbell.
 
Campbell Ritchie
Sheriff
Posts: 53767
127
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome and I hope it was helpful
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In reality, in-lining code is critical to any performance in modern CPUs. It can be generalized to include loop unrolling, converting

for (int i = 0; i < 50; i++)
x += y[i];

into fifty lines of
x += y+0;
x += y+1
x += y+2

etc. for nearly 20 years even PC CPUs have been heavily pipelined, and to keep the pipeline working at optimal speed, you need to have straight line code, no loops, no tests, no subroutine calls. Originally, this meant that the compiler had to do the optimization, but with cool stuff like the JVM, the runtime can do it, and speeds can be as fast as optimized C or assembly.

 
Campbell Ritchie
Sheriff
Posts: 53767
127
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, Pat. Of course, if the runtime can do it, then you needn't do the inlining yourself.
 
Tim Holloway
Bartender
Posts: 18531
61
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's not easy these days to determine what will be inlined. And inlining is usually a trade between storage and speed, since the inlined method is going to be copied to where originally there would have been a call. There may be optimization settings that affect what gets inlined. Hey, it could have been worse. Method inlining used to be something you had to explicitly request back in the days when C++ was new and shiny. Technically, things like code simplification (which, BTW, i*2 is actually more likely to generate "i+i"), code hoisting, loop unrolling aren't really "inlining". That name is generally reserved for "method inlining".

Chances are pretty low that the generated bytecodes for a method invocation of one class's methods will be inlined into another class's methods, since classes are supposed to be mutually independent as far as each other's internals go. On the other hand, we live in exciting times, A JIT bytecode compiler may inline code from another class, and, for that matter, the JVM may monitor code and switch it into inline machine code based on recorded usage.

The one rule that you can pretty much count on is that the best candidates for inlining are short, simple (after optimization) and relatively self-contained. Longer segments are not only harder to analyze, but the overhead of the call process is typically swamped by the amount of process code. Plus, since inlining consists of making copies of the code over and over, it's going to take a LOT more memory to inline a large method than a small one.

 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Holloway wrote:It's not easy these days to determine what will be inlined. And inlining is usually a trade between storage and speed, since the inlined method is going to be copied to where originally there would have been a call. There may be optimization settings that affect what gets inlined. Hey, it could have been worse. Method inlining used to be something you had to explicitly request back in the days when C++ was new and shiny. Technically, things like code simplification (which, BTW, i*2 is actually more likely to generate "i+i"), code hoisting, loop unrolling aren't really "inlining". That name is generally reserved for "method inlining".


Its never been easy, and I expect it will never become easy, to optimize code, which is why Donald Knuth quoted Von Neuman saying "Premature optimization is the root of all evil"

Actually i*2 is usually optimized to i << 1 as shifting is usually faster than any arithmetic.

Trading off storage and speed is one of the classic optimization trade-offs, but when the JVM does it, its just memory in the computer and maybe cache, rather than expensive disks. And with disks so cheap these days, even saving disk space is silly for most applications.

I agree that it was far more important to worry about this back when C++ was new. But as with the switch from manual overlays to virtual memory, computers are better at most of these decisions than even the best humans. Thus worrying about Inlining is really not worth much brain power.


 
Tim Holloway
Bartender
Posts: 18531
61
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't confuse what I said about "it's never easy to determine what will be inlined" with how easy it is for a compiler to optimize.

The ultimate in optimization is selection of a suitably efficient algorithm by the application programmer.

Knuth wasn't referring to machine-based optimization, nor do we, when we repeat him. For that matter, I doubt he'd even claim to mean that in the algorithm-selection sense. Where the real trouble comes is when you start worrying about how many nanoseconds a given instruction takes before you have a properly clean and functional algorithm. "Penny wise and pound foolish", as it were.

Optimization theory is an open-ended problem. Although specifics can be optimized, the larger space in which computers must work is far too complex for them to consider every contingency. There is, after all a point of diminishing returns. Thanks to years of experience and exponentially more powerful hardware, however, we can now expect an impressive array of options to be applied on our code. Which is what I was referring to. I've gradually given up on worrying about optimization at the lower levels. If there's an actual problem, then I worry, and not before. The days when I not only had to explictly request inline methods (C++) are now as much ancient history as the even earlier days when I had to explicitly request use of register-based operations.

Actually, I've worked at low levels with a lot of hardware. Usually, but not always, addition and shifting take a single machine cycle, so they're interchangeable. Providing you do an arithmetic shift! Otherwise you'll have problems shifting the sign bit out and also in failure to detect overflow properly. A more common reason for preferring a shift over an add is that the optimization agorithm in question may also be handling "*4", "*8", and so forth or even "*12", "*10", since multiplication is almost invariably more expensive than either adds or shifts. Especially since these days shifting is normally done as a parallel operation using barrel shifters instead of by "brute force" bit-shoving.

A targeted compiler would thus be more likely to select shifting. However, I'm not so certain in the case of gcc. Gcc is designed to compile for a wide variety of backends, not all of which may operate with the same low-level performance. since a lot of folding is done on the abstract tree at the front-end of the process, it may be doing the *2 optimization while the arithmetic-versus-logical distinction is still paramount. I haven't looked lately.

 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Holloway wrote:Knuth wasn't referring to machine-based optimization, nor do we, when we repeat him. For that matter, I doubt he'd even claim to mean that in the algorithm-selection sense. Where the real trouble comes is when you start worrying about how many nanoseconds a given instruction takes before you have a properly clean and functional algorithm. "Penny wise and pound foolish", as it were.


For sure, understanding the properties of Algorithms was the heart of Knuth's Art of Computer Programming series.

The reality is that nearly all questions in this forum section are premature. I've been following it for as long as I've been on the Ranch, and most of the questions are worrying about nanoseconds. Rarely does someone ask about an implementation that is O(n^4) where we can suggest changes to make it O(n^2). Or then the N under consideration is so small as to make the whole issue moot.

Nearly all real world optimization problems are things like doing too many complex Join functions in SQL, where making the Java code take zero time will not improve the actual performance.

 
Tim Holloway
Bartender
Posts: 18531
61
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pat Farrell wrote:
The reality is that nearly all questions in this forum section are premature. I've been following it for as long as I've been on the Ranch, and most of the questions are worrying about nanoseconds. Rarely does someone ask about an implementation that is O(n^4) where we can suggest changes to make it O(n^2). Or then the N under consideration is so small as to make the whole issue moot.

Nearly all real world optimization problems are things like doing too many complex Join functions in SQL, where making the Java code take zero time will not improve the actual performance.



Yup. And we spend a lot of time pointing that out. Actually one of the worst performance problems I ever saw took down a multi-processor IBM mainframe every day at 4pm. It's not easy to knock out an IBM mainframe. It's even harder to explain to management that it happened once. Let alone for several weeks. And when I say "knock out", I meant all the way down. Reboot the whole processor complex. Or IPL, if you prefer IBM-ese.

The offence in question wasn't lack of inlining or too much code in a loop. It wasn't even (directly speaking) poor algorithm choice. It was a single setting in a configuration.

Always remove the camels from the water barrel before straining out the gnats.
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!