• Post Reply Bookmark Topic Watch Topic
  • New Topic

multithreaded servlet

 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm working on improving the performance of several servlets.

What we have is more like an enterprise app with a web interface than a web site as some operations take minutes. Our users are more than happy with that kind of response because they understand what is being done and the alternatives are comparable in speed but not as easy to access.

I've described the app before and the Ranch has been very helpful. The short version is: we have multiple backend servers spread across the country with a 2PB data store soon to grow at about a PB/year. This app figures out which server is "best" for a request, transfers data analyses and plots, downloads, or plays as audio.

The simplest operation I would like to parallelize is data discovery. Right now we have a local database that keeps track of which backend servers have which channels but not what data is available for those channels. So the process is guess which backend is best, try it, if it fails try the next best...

The backend servers have just implemented a "check if data is available without transferring anything" operation. This process takes maybe 2 seconds, which BTW is a big improvement.

What I'd like to do is to ask all of them at once "what do you have for me". The process is mostly I/O wait time so server load (our app) is not an issue. Only the single method is multithreaded, the servlet itself is one thread controlled by the container (per instance of course).

I've read that multithreaded servlets are a big no-no but it's not clear to me whether this scenario could cause problems or violates any of the best practices.

I appreciate any comments and all suggestions.

Best,
Joe
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After posting this, I see I forgot to mention the alternative.

This app already cal;s many external programs to do the heavy lifting, most of them in languages other than Java.

Rather than a simple multi-threaded method this could be done with an external executable, however, I don't see the advantages beyond satisfying the "don't play with threads in a servlet" requirement/recommendation. The disadvantage is it's a lot more work and if it's done in Java (easy) vs. C++ (harder but not unreasonable) there's the JVM start up time and memory requirements.

Joe
 
Stefan Evans
Bartender
Posts: 1822
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
>Only the single method is multithreaded, the servlet itself is one thread controlled by the container (per instance of course).

The "per instance of course" comment worries me a little.
As I understand it, Servlets are multithreaded by default. An instance of a servlet can have multiple threads running through it at one time , which is why you don't use instance variables in them.
Does this mean you have implemented SingleThreadModel on your servlets?
Perhaps you mean one thread per request?


In terms of you plam. I'm wondering how you envision this working
You poll all of the servers asking "what have you got"

Do you wait for all the responses to come back before continuing?
What if one of them fails to respond?
What about errors/exceptions?


What you have described sounds justifiable to me.
It might be better implemented as a seperate service that your servlets invoke. That keeps the puritans happy by wrapping the threads in another layer.
The servlet shouldn't have to care that the service is multithreaded as long as it gets the information it asks for.
You might consider using the Servlet 3.0 async calls if you aren't already.

my 2 cents in any case
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Stefan,

Stefan Evans wrote:>Only the single method is multithreaded, the servlet itself is one thread controlled by the container (per instance of course).

The "per instance of course" comment worries me a little.
As I understand it, Servlets are multithreaded by default. An instance of a servlet can have multiple threads running through it at one time , which is why you don't use instance variables in them.
Does this mean you have implemented SingleThreadModel on your servlets?
Perhaps you mean one thread per request?

I am currently reading Java Web Services: Up and Running by Martin Kalin. He says "A web server such as Tomcat can arbitrarily many instances of a servlet". But you are right it is one thread per request, I was unclear.

Stefan Evans wrote:
In terms of you plam. I'm wondering how you envision this working
You poll all of the servers asking "what have you got"

Do you wait for all the responses to come back before continuing?
What if one of them fails to respond?
What about errors/exceptions?

The plan is to either wait until they all respond or when one succeeds we kill the others. There is a timeout on each request. Errors and exceptions at this level are the same as data not available
Stefan Evans wrote:
What you have described sounds justifiable to me.
It might be better implemented as a seperate service that your servlets invoke. That keeps the puritans happy by wrapping the threads in another layer.
The servlet shouldn't have to care that the service is multithreaded as long as it gets the information it asks for.
You might consider using the Servlet 3.0 async calls if you aren't already.

my 2 cents in any case


Thanks again. I am not using the async calls, I'll have to read up on them.

Best,
Joe
 
Tim Holloway
Bartender
Posts: 18418
60
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Servlets do not run as threads, and therefore are forbidden from spawning threads.

The thread that runs a servlet's code is loaned to the servlet for the lifetime of a single request/response cycle and then immediately returned to the appserver's thread pool. Should you return a thread to the pool with child threads hanging off of it, you violate the symmetry of the threads in the pool. Worse, you are leaving unpredictable appendages attached and can potentially crash the appserver itself.

Because the servlet processing thread is loaned and not owned, you should complete your servlet request processing as soon as possible and return. If you do not, you risk exhausting the thread pool and thereby trashing performance. So making a servlet run as a long-running child-thread owning process is a Very Bad Idea indeed.

Conversely, since the threads are loaned out on a per-request basis, you don't need to do anything to enjoy the benefits of multi-threading, since that capability is inherent in the architecture and provided by the server. All that's required of you is to observer thread-safety protocols.
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Tom,

Thanks for the clear description of what I would call a best practice. I'm not sure about this yet and will proceed with caution in my test.

I think the best way to look at this is from cost benefit perspective.

On one hand wort case looks like if, despite my best efforts, a sub-thread never exits. In the single thread model this would block the servlet's thread.

In the best case, when no data is available we return in a couple of seconds instead of 10's of seconds.

The system testing of this app is fairly complete. If this gets implemented against best practice we will beat it to death.

Best,
Joe
 
Tim Holloway
Bartender
Posts: 18418
60
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Worst-case is that your child thread is running asynchronously, the parent (servlet) thread gets returned to the pool, the thread gets handed out again, things start snarling up and the server crashes completely at unpredictable moments and in unpredictable ways (Actually, Murphy predicts pretty accurately: whenever and however it costs most in terms of time, money or credibility).

Second-worst case is that if you're hung on something synchronous, you'll exhaust the thread pool thereby locking up the server. Not just YOUR app, but EVERY app in the server and the only way to recover will be to restart the server, suffer downtime and possibly lost work and data corruption.

These are extremely expensive consequences, so we urge you not to consider such an approach.

Also, a servlet should be responding in milliseconds, not seconds. After 3 seconds, people start to twitch. After 10, they start considering doing foolish things. After about 30, they're already doing foolish things. And top of the list of foolish things is randomly clicking buttons, which can result in even more server load/application slowdown.

There are acceptable ways to run threads in a webapp. Just don't spawn them from servlets. And remember that multi-threading isn't magic. There's about a 10% overhead for multi-threading, so unless you can get some serious multi-core response and/or the threads spend a lot of time waiting for resources, a multi-threaded solution may perform more poorly than a single-threaded one. The reason that servlets run under separate threads has as much or more to do with isolating them from each other's resources as it does for performance.
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I do appreciate your comments Tom but I don't think you quite get what I'm doing.

1. The threads will not be asynchronous to the servlet. A single method called by the servlet can either run through a list of (i/o wait intensive) network connections testing for data availability or test them all at once, wait for all of them to complete then return. Either way, if something bad happens in this method the servlet will hang and eat up one of the threads from the pool. It's been running for about 3 years and that has happened once or twice. To my chagrin, I have no idea why.

2. This app has some navigation pages and some analysis functions. I agree with your timing suggestions on the navigation and they all have sub-second response even the database intensive ones.

The analysis which require network data transfers and some decent load for calculation and plotting. These are almost completely done in external programs written by me and others in various languages. Most of them are published so users can run them on their workstations or on one of the clusters. The long response times are almost all in the external programs and most of that time is spent in data transfers.

People use the app because a) the interface is easy to use, b) they don't have to install anything and c) they can use their phone or tablet if off site. In many cases the app is faster than running locally because the data center Internet connection is better.

I know this is atypical and I've struggled with the rule of thumb that every web page should load in less than a second. I collaborate with almost all the users of the app and their comments, suggestions and complaints almost never mention response time negatively.

Best,
Joe
 
Tim Holloway
Bartender
Posts: 18418
60
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
if something bad happens in this method the servlet will hang and eat up one of the threads from the pool. It's been running for about 3 years and that has happened once or twice. To my chagrin, I have no idea why.


Well, I did say results would be unpredictable. I worked with people who had system that would do that, although more like about 3 times a week. They never had time to do a post-mortem/repair but always had time to reboot. The problem with "once or twice over 3 years" is that very suddenly things can flip to "once every 4 or 5 minutes". I've had that happen to me. For example an OS update almost earned me a 3AM all-expenses paid trip to Chicago because someone's data compression library had been doing something "clever" for years.

OK, going back to the head of the thread and cross-correlating. It appears that you have a collection of content servers and you have been polling them for available content. Polling was done on-request per-user and not in conjunction with a master (shared) content list. This sounds like something that an outfit such as Netflix might do, although I'd expect that Netflix is also doing things that allow stuff like migrating content nearer to consumers.

First question would be whether or not you would be better served by having a master (non-servlet) thread handle the polling at regular intervals and leaving the servlets the less-intensive task of checking against the accumulated results (which could be cached locally and/or stored in a local database).

Second question would be whether it's feasible to avoid polling in part or in whole by making the content servers post content changes as they happen. For example, as a callback process.
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tom,
I really appreciate the discussion, I rarely get such intelligent comments when talking to myself about this kind of stuff.

Tim Holloway wrote:Well, I did say results would be unpredictable. I worked with people who had system that would do that, although more like about 3 times a week. They never had time to do a post-mortem/repair but always had time to reboot. The problem with "once or twice over 3 years" is that very suddenly things can flip to "once every 4 or 5 minutes". I've had that happen to me. For example an OS update almost earned me a 3AM all-expenses paid trip to Chicago because someone's data compression library had been doing something "clever" for years.

I'm a big fan of post mortems and are learning to do them better. My suspicion is that the hold up is in the network transfer libraries. It's a proprietary protocol. I've been working closely with that team identifying and fixing problems. I've also seen rare problems turn frequent when making some unrelated change.

Tim Holloway wrote:OK, going back to the head of the thread and cross-correlating. It appears that you have a collection of content servers and you have been polling them for available content. Polling was done on-request per-user and not in conjunction with a master (shared) content list. This sounds like something that an outfit such as Netflix might do, although I'd expect that Netflix is also doing things that allow stuff like migrating content nearer to consumers.

First question would be whether or not you would be better served by having a master (non-servlet) thread handle the polling at regular intervals and leaving the servlets the less-intensive task of checking against the accumulated results (which could be cached locally and/or stored in a local database).

Second question would be whether it's feasible to avoid polling in part or in whole by making the content servers post content changes as they happen. For example, as a callback process.

Actually, I've been pushing for something like that without much luck. Part of it is we're near the end of a 5 yr gazillion dollar upgrade to the instruments and there are higher priority things before we go live in the fall. (details at http://ligo.org)

This app was the first that took the responsibility for finding the data, before that you had to know which server to contact or which cluster to log into. Data is continuously acquired in real time and needs to be copied before being accessible but there can be gaps. Just saying it's not an easy problem.

Best,
Joe
 
Tim Holloway
Bartender
Posts: 18418
60
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, if there's a gazillion dollars involved, that means probably at least one MajorVendor and when you tell them you're violating J2EE specs, they're likely to say you voided the warranty.

It's probably actually easier/cleaner to set up and run a master poller thead from a ServletContextListener than it is to code that sort of stuff in servlet code anyway. The general problem set isn't that uncommon, regardless of whether the scope or protocols are.

Something of this nature can be built in large part using off-the-shelf components and common design patterns and implemented as a JavaBean. Meaning that the Master Poller could be made to run stand-alone or under some other sort of service instead of being wholly dependent on servlet architecture. It's also easier to test and tune stuff built that way.
 
Joe Areeda
Ranch Hand
Posts: 333
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Holloway wrote:Well, if there's a gazillion dollars involved, that means probably at least one MajorVendor and when you tell them you're violating J2EE specs, they're likely to say you voided the warranty.

I wished there was a major Vendor that imposed standards but this is a multi-university collaboration supported by grants. It's a lot looser than that. Most of the code is written and maintained by physicists. That presents its own challenges.

Tim Holloway wrote:It's probably actually easier/cleaner to set up and run a master poller thead from a ServletContextListener than it is to code that sort of stuff in servlet code anyway. The general problem set isn't that uncommon, regardless of whether the scope or protocols are.[/quote[
Just to be clear this code won't be in the actual servlet. It will be in the class library that wraps the network protocol API.

Tim Holloway wrote:Something of this nature can be built in large part using off-the-shelf components and common design patterns and implemented as a JavaBean. Meaning that the Master Poller could be made to run stand-alone or under some other sort of service instead of being wholly dependent on servlet architecture. It's also easier to test and tune stuff built that way.

I will think more about this but I'm having trouble wrapping my mind around the Master Poller concept. If I understand what you mean it's a task of some sort that keeps track of all the data that's available on all the servers.

I did try something like that in the begining of the project but doing it on the client side is very slow. I have proposed a design that leverages the server's data structures to create a mysql database. The idea is each server would maintain a db of what it knows about and replicates it to my project located centrally. I'm sure if an when this project will be funded.

Best,
Joe
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!