I'm facing a problem of what seems to be silent dropping of HTTP connection to Tomcat server, when serving a long running request from browser.
I was wondering if anybody else has run into such a problem, and how they solved it.
The situation is like this:
My webapp uses Struts 1.2 / JSP / jQuery, and runs on Tomcat 6.0.29 on a 64bit remote Ubuntu VPS.
It involves video conversion - users upload video files (between 5-20MB typically), and app converts them to FLV.
Upload form is submitted from a JSP page, using jQuery ajaxForm plugin.
When user clicks Upload, ajax request is POSTed, encoded as "multipart/form-data".
A Struts Action receives uploaded file, converts to FLV and executes other logic involving DB operations.
During processing, user just sees an animated busy icon. No HTTP response is posted by server while processing is going on.
Once processing is complete, success, failure or error messages are returned to browser in JSON ("application/json") format with a HTTP SC_OK (200) status.
The ajaxForm's success() or error() handlers then execute and show the received information.
Now the problem:
On a single machine or LAN, everything works fine, though the entire processing may sometimes take as much as 15 minutes.
But on the remote Ubuntu VPS, what I find is that after some time (~2 minutes), the HTTP connection drops silently. This is revealed by netstat -an.
No exceptions are thrown - neither in the app code nor any seen in Tomcat logs.
I'm catching Throwables, not just checked Exceptions, and logging them - so there's definitely no exception from the app.
Moreover, the processing on server goes through completely successfully (revealed by logs).
Even the response.getWriter().write() for the JSON response goes through without exception.
But client machine never receives it (revealed by wireshark). So browser is not notified at all about the response.
User ends up seeing an animated busy icon for almost 30 minutes or so till some browser timeout comes into play and then ajaxForm's error() handler
is called with error=timeout.
Initially, I assumed it's an issue with this particular server. But surprisingly, exact same problem occurs on another remote server too!... an Amazon EC2 AMI running Amazon 64bit Linux and Tomcat.
In order to systematically isolate the issue, I did some experiments:
- Is it a Struts issue? To find out, I wrote another simple webapp with just 1 servlet which simulates a long running op by just sleeping for about 15 minutes without sending any HTTP response.
Then wakes up and sends 1 response and ends. Same problem is seen here too. So issue is not with Struts or file upload.
- Is it Tomcat configuration? I experimented by setting Tomcat's Connector properties in conf/server.xml. I played around with all combinations and ranges of connectionUploadTimeout, disableUploadTimeout, timeOut, keepAliveTimeout - All to no avail.
- Problem is seen on all 3 browsers I tested (IE,FF,Chrome), so it can't be a browser issue.
- Experimented with the client side keep alive values (net.http.keep-alive.timeout in FF's about:config). Again, no success.
- One other clue I got...though it's from the Tomcat JK connector docs which is not applicable to me, the problem seems similar:
...One particular problem with idle connections comes from firewalls, that are often deployed between the web server layer and the backend. Depending on their configuration, they will silently drop connections from their status table if they are idle for to long...
- The response "Transfer-encoding" is set by default to "chunked" by Tomcat itself (revealed by Wireshark). So it can't be a problem of clients thinking response is over.
My current workaround
The only solution I've found so far is to write some small bits of response periodically about every 30 secs - while processing is going on - just to keep the connection alive.
It's unnecessary for functionality and smells like a hack to me!
Worse, I've to bring in some kind of timer thread to write periodic response while a worker thread does the processing.. so I'll be creating threads in Tomcat environment!
Overall, I'm not comfortable at all with this solution that seems more of a workaround.
And now my questions:
1) I don't think this use case of a long running operation is all that rare. So have any of you run into this issue and how did you solve/workaround it?
2) Is it really necessary to send a response periodically just to keep the connection alive?
3) The timer thread to send periodic responses and worker thread to do processing are just an idea which is simple at the moment to implement. But what would be a good solution
from your point of view? This app has a roadmap to eventually become a 3 tier one where processing will be handed off to a EJB in an app server...but at this point, everything has to run in Tomcat due to other constraints. And even with a separate tier, the problem of long running operation in web tier still remains. So what would you suggest as a good solution?
All suggestions are welcome.
Thanks in advance,
For any request that's routinely going to take more than about 30 seconds, you should consider running a background processor to handle those kinds of requests. You can make this simple or complicated. I often have a synchronized work queue that the HTTP drops scheduling requests into, along with a status query manager so that subsequent httpd requests can see when the work is done.
It is CRITICALLY important, that any threads you spawn for this sort of processing NOT be done in the HTTP request handler. The request handler isn't supposed to spawn threads because it's a pooled resource, and you can't cleanly return the request to the pool while a thread is running. However, the init() method of a servlet is a good place to start an "engine" thread that reads the request queue and dispatches work.
Both helped me think deeper about my design and realize it's inefficient.
In my 1st solution, the request worked like a blocking call, sending little bits of responses but not returning final SC_OK response till the operation ended after 10-15 mins.
Reading your points, I realized I have only 1 servlet (the struts controller servlet) and that has a limited pool of request threads to serve all actions, not just video conversion action.
Which means, I was locking up a request thread for a long time which would prevent more clients from making other requests.
Bad idea and Inefficient usage of resources too!
So, now I've switched over to a design with a status query manager, and the client side has the responsibility to periodically poll server for status of the operation.
Plus, instead of spawning worker threads blindly, I switched to a thread pool and put a cap on the work queue size. If the system can't handle a new request because all threads are busy, I felt it's more graceful to simply tell the new user upfront it can't handle the request at this point, rather than make life difficult for all users.
Hopefully, this info will help others who run into this problem.
I still remain surprised why long running op worked fine on a LAN but not over the Internet, though Tomcat connector settings were made the same in both environments.
The possible differences I can think of between these 2 environments, leading to the error:
1) LAN server was on Windows, while both my servers on Internet were on Linux. Perhaps the OS socket handling, timeouts, etc are different.
2) There was no proxy server, NAT or firewall in the LAN environment - perhaps one or more of these had a role.
Actually, the timeout on a web request is determined by the client and how long it's willing to wait for a response. In my experience, it used to be a lot shorter, but these days, 5-10 minute allowances seem to be common. The cynic in me says that's because too many sloppy webapps are being hacked out. The user in me notes that I pay my telephone bills by check instead of online because a certain very famous telephone company's java-based website has such atrociously slow performance I can no longer stand the delays involved in logging in and paying online.
Beyond that, I can't say why one environment is more fortunate than another, but if you're pushing those limits, you're already taking too long.
Using a thread pool is one very good way to handle your problem - especially if you anticipate a lot of worker processes or a lot of work that can be done in parallel. The only caveat is that - as I've said before - you don't want to create those threads in an HTTP request process. Worst-case scenario would probably be that after the HTTP request thread was returned to the Tomcat pool and passed out to a new HTTP request, that new request might bomb, terminating the thread. Including its invisible sidekick threads.