In the last couple weeks, our production system's JVM has twice gone AWOL overnight. The java process is simply not there anymore, nothing to be found in any application log(4j) files.
In the bin directory, we found a couple of those hs_err_pid.log files, and it contains all sorts of information as to what the state of the JVM was, when it decided to die.
(more from me, after this dump)
and looking up that thread, I find it is: (the middle one)
The above snips are found in one of the dump files, and are representative of those found in the other. The current VM operation is "full generation collection" and the current thread is "GC Daemon", in both. So, I assume the JVM is dying, when it attempts a full GC.
Question 1 Is that a reasonable deduction to make? (it dies attempting a full GC)?
What changed in the past few weeks? No JVM updates. No hardware updates. No OS config changes. A small code update (we patched a few bugs)
My thoughts so far: 1) the small code updates we did, while not *in themselves* buggy, now exercise a previously existing condition in our JVM/environment. For example, if the new code is slightly more efficient with memory, then GC's don't happen as often as they used to, and perhaps that causes problems "down the road".
2) hardware? (is one of our RAM sticks dying?)
3) a genuine bug in the JVM (ha!)
Question 2 What else should I be looking at/thinking of? [ August 20, 2008: Message edited by: Mike Fourier ]
Bottom line is that whatever Java code you execute, JVM should never crash. If it does, it is just a JVM bug. So try to log a defect with Sun with all details and hopefully they will look into that.
To get going in your production, try to upgrade/downgrade the jvm to next/previous update and see if it fixes the issue. I think you would waste more time in debugging why it crashed than trying out newer/older JVM updates.
Tenured gen is only 11%, so not sure why a full GC is being called for. Perhaps though, the 11% represents what the tenured gen was able to be drained to, at the time the JVM fell over. (that is: it *was* higher, like 90% used, and the VM was able to get it down to 11%, and then experienced a fatal error).