This week's book giveaway is in the Agile/Processes forum.
We're giving away four copies of Building Green Software: A Sustainable Approach to Software Development and Operations and have Anne Currie, Sarah Hsu , Sara Bergman on-line!
See this thread for details.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Detecting Crashes in JBoss 4.0.2

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to figure out a way to detect a web crash for jboss-4.0.2. The problem we're having is the web layer just stop responding, yet the process continues to run and may even be processing JMS messages in the background; the web layer just becomes more or less useless, or too sluggish to use, until the application server is rebooted. Often this produces no logs, it just becomes hosed.

I am looking into several different approaches; but if there is one that is already proven, that would certainly be preferred. Does anyone know of something that I'm not thinking of which might work for this problem?

CURL

One I am contemplating is the use of CURL to log into the application. I haven't tried all of the options for this command; but I'm not sure that this is going to be very straightforward. When the app is crashed it just sits there and waits for it to respond, wasting time. I need to know within a few seconds if someone can log-in.

Here's a sample of what I have tried with CURL:
curl -d "login=admin&password=test" http://localhost:8080/web/login.do

CURL and log monitoring

A hybrid that may allow this CURL option to work would be to login with CURL every minute or so and check the logs at the same interval. If it's been too long since the logs recorded a login, there's a problem.

Twiddle

Another that looks promising is twiddle. Perhaps there is some information in this which can tell me that the web layer isn't responding.

for example:



I would love it if the above script was all that is necessary; but it's not. It still produces the same output even if the web layer becomes hosed.

I have found that this will give me more info; but I'm not sure what to look for. There are a few other properties that can be loaded as well; but its difficult to know which will tell me that there is a problem and since I haven't yet gotten it to where I can reliably reproduce the problem in a test environment, I have little opportunity to check the values and compare them to normal values.



 
author
Posts: 5856
7
Android Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Pretty much anything that can make a URL would work. I've even seen JMeter used for stuff like this. However, it might be better if you tried to figure out the cause for the "hang". Things that I have seen (and how I figured out the root cause) are:

a) An infinite loop in the application code. The developers swore to me (or actually, to the person I was interacting with) that there was no such infinite loop. I had them take several JVM thread dumps, several seconds apart and look for threads that were always busy in the same location. They found the loop. Such an issue can "steal" threads from your thread pool (because the threads are never released). And users often won't complain - all the'll see is that their request is taking a long time to complete so they'll try it again and maybe this time they won't hit the combination of factors that causes a loop.

b) Heap space issues. Gather garbage collection statistics and use them to right-size your heap. If you end up filling up the heap, then the JVM will constantly perform major collections, which slows things down to a crawl.

c) Poor database access schemes, poorly written queries or poorly planned database updates. You need to gather database statistics to track these down.
 
Chris Case
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Johnson wrote:Pretty much anything that can make a URL would work. I've even seen JMeter used for stuff like this. However, it might be better if you tried to figure out the cause for the "hang". Things that I have seen (and how I figured out the root cause) are:



I have a few ideas of what could be causing the problem; but getting it all nailed down and validated is going to take some time. Since I also realize that problems like this may occur in the future, I want to have something to help us monitor it, collect crash statistics/logs, alert us and possibly issue a restart.

I suppose another key area I'd like to improve is developing a way to simulate production-level system activity, to hopefully bring these issues to the forefront before they manifest on the main system. This seems like it would require a fair amount of effort to accomplish with realistic tests. Even still, you have those edge cases where people are using the system in ways you haven't conceived of; perhaps a method of recording production activity for a period of time so it can be "replayed" in tests, would be useful.

Peter Johnson wrote:a) An infinite loop in the application code. The developers swore to me (or actually, to the person I was interacting with) that there was no such infinite loop. I had them take several JVM thread dumps, several seconds apart and look for threads that were always busy in the same location. They found the loop. Such an issue can "steal" threads from your thread pool (because the threads are never released). And users often won't complain - all the'll see is that their request is taking a long time to complete so they'll try it again and maybe this time they won't hit the combination of factors that causes a loop.



I believe the most likely explanation is an infinite loop. I have seen occasional log messages indicating a stack overflow when our struts action.findForward() appears to be redirecting indefinitely beteween two pages. If this is the case, however, it isn't always logging it down.

Peter Johnson wrote:b) Heap space issues. Gather garbage collection statistics and use them to right-size your heap. If you end up filling up the heap, then the JVM will constantly perform major collections, which slows things down to a crawl.



We've experienced these in the past, fortunately they tend to generate log messages. The server hardware has a large amount of memory, far in excess of what could be maxed out under most conditions, so I doubt this is the case; but it is worth looking at those garbage collection statistics again for sure.

Peter Johnson wrote:c) Poor database access schemes, poorly written queries or poorly planned database updates. You need to gather database statistics to track these down.



Fortunately, this is an area we are typically okay on. I have been tuning the database part of the application for some time and using monitoring tools such as innotop to monitor the system. The application has overcome many hurdles related to a design that wouldn't scale well at first.

I'm thinking there is an issue with infinite loops and fortunately we have a later version of the module which gets us into the loops. The later version, instead of forwarding between pages, uses more pop-up modal windows, thus minimizing the complexity. I'm happy to say that we are moving into the direction of heavier use of modal windows and away from all of the forwarding logic which can get so cumbersome on complex screens.
 
Chris Case
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I did eventually learn how to discover a crash; better yet, I learned why the server crashed and I think I have it fixed now. I did a thread dump during a crash and saw many threads with threadState BLOCKED and WAITING. When I examined the thread with the WAITING threadState, I saw that it was waiting for a connection from Hibernate's c3p0 thread pool. This pool was too small and was creating a bottleneck during times when usage spikes.

For the full detail, please refer to this thread:

https://coderanch.com/t/559684/EJB-JEE/java/Separating-JMS-Producer-Consumer-JBoss
 
Chris Case
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As for detecting a crash, the following script is what I'm going to use to detect this kind of a crash in the future:

 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic