Up until now, I have only dealt with systems that use a single sever, but I am now considering a system that is load balanced with apache mod_jk to several servers that run tomcat.
I need some help understanding how this works, so please let me know if I have the right idea, thanks!.
So I have 3 cpus lets call them apache0, tomcat0 and tomcat1. Apache0 runs mod_jk and is only accessible to the public via http requests. A request comes into apache0, and is forwarded to one of the tomcat servers to service the request. (and here is where I am a little unsure)... then does the response pipe back to apache to be sent to the client? If this is the case, what happens to that apache service thread during this whole process? Since apache needs to "wait" for tomcat to send back the response, does apache block the IO on that thread (or does it do something similar to tomcat's NIO connector)? Does this waiting time take a lot of resources for apache (my assumption is no)?
mod_jk on initialization opens a pool of TCP connections to Tomcat AJP connector port (usually 8009).
When a request arrives for apache that matches a JkMount path, mod_jk executes the following on the same thread as apache (because mod_jk is an inprocess module):
- forwards the request as a TCP packet (AJP protocol)
- then waits in a while loop on the same thread for AJP response packets from Tomcat.
- Everytime a packet arrives, it's processed appropriately. Packet may be a "send header" packet, "send response chunk" packet, etc.
- If the packet is a headers packet or a response packet, mod_jk immediately writes the response back to client via apache response writer in the same thread.
- If the packet has an end of response flag, mod_jk breaks out of the reading loop and returns the connection to pool for reuse.
So yes, it is blocking I/O, because it's requesting and waiting for response on the same thread.
That thread is the thread context assigned by the overall apache request processor. Based on apache's active MPM (multi processing module) strategy, that thread could be a single thread in a multi process configuration, or a pooled thread in a multithreaded multiprocess configuration.
With default options, the TCP connections and socket buffers created by mod_jk would be the additional resources if request is handled by mod_jk rather than by plain tomcat. I would not say this is much...atleast in comparison to the real performance heavyweight - the apache process itself.
You've not mentioned which OS you're on. But if it's *nix, then default apache behaviour is to fork a process (not just a thread) for each new request. Apache slightly optimizes this by preforking a set number of worker processes and reusing them, but it's still 1 process -> 1 request. This could be a bottleneck compared to multi threaded strategy, depending on your request load, especially since in this case apache is being run as a simple LB.
The solution is to make apache run as multithreaded multiprocess hybrid. For that, you need to compile apache from source specifying worker MPM config flag.
Also ensure that the apache process/thread limits and mod_jk TCP connection pool size should match total AJP connector threads across tomcat instances.
mod_jk is somewhat like a reverse proxy. It's suitable if the fronting apache is working not only as a loadbalancer, but also as a web server.
But if you want just a LB, you might want to evaluate a dedicated reverse proxy like apache traffic server, squid or nginx.
If you have the time, fire away at different configurations using JMeter scripts and see which one holds out the longest.
Thanks for your answer, it was very helpful. To answer a few of the info I left out. I am using ubuntu server, and I likely will sever some static content off of apache, so it would be convenient to use apace as a load balance and static content (as opposed to a mere reverse proxy).
Let me get one more clarification if possible:
So, spache sends the request to a tomcat instance. The apache process blocks and waits until it receives the responses from tocmat, which it then sends to the client. While waiting, Apache is in a loop. I assume that this loop does not take much off the cpu. (correct?) and this is one of the reasons why it can scale: we can have many apache processes because each of them are doing very little (simply waiting for responses from the tomcat servers and then sending them back to the client)... is this correct?
The mod_jk loop does not take much CPU at all, because it spends most of its lifetime blocked on the socket read. It's not CPU bound... it won't contribute to CPU utilization. But you may see system load rise because linux counts I/O wait processes as additional load.
More than CPU, you should worry about memory. Creating processes and threads makes it memory bound. Ensure you have requisite RAM.
As you plan to serve some static content off apache, I suppose it's fine to start off with apache + tomcat as a conventional reference configuration.
If you run into capacity problems sometime in future , check out replacing apache with lighttpd or nginx to solve them.
They too are capable of serving static content and load balancing. But they are also lightweight compared to apache (that doesn't mean apache is "bad" - it's just more functionally capable than them but often you don't need all that functionality), and because they use a completely asynchronous event driven model as opposed to apache's thread/process model, they scale better for throughput.
thanks again for helping me understand this. I think apache with mod_jk to external tomcat servers will be a fine design for now, and we can just add more tomcats as needed. If it seems to be taking some hits, I will also explore looking at some of the other solutions you have mentioned. The only concern with those is I will need sticky session. mod_jk seems to handle this well out of the box. I could not find quick info on the other LB's you mentioned for this feature. But I am sure I could write some kind of rule or something to handle this. Anyway, I think it is far off until something like that will be necessary.