Forum:

OO, Patterns, UML and Refactoring

what is the best way to support High availability

Ranch Hand

Posts: 620

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hello all
i have application that making remote application invocation using some times corba and sometimes
web services but this is not the impotent issue the issue is that i need support High availability with the application server .
that is how to detect when the application server is down and how to connect to the second one when
remote invocation is preformed .is there any well known pattern or way ?
i know that the simple form is round robin algurithem.

Stan James

(instanceof Sidekick)

Posts: 8791

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Have you looked at hardware devices and virtual IP addresses? We use an Arrowpoint device. Clients have the IP of the Arrowpoint, and it has the IP addresses of the real servers. You can set it up for load balancing or failover with different algorithms. We have another layer of virtual addressing ... a geographic switch routes to a site, then Arrowpoint routes to a server.

Rolling your own is pretty tricky, but doable I suppose. You'd have to detect a server out of action, close any connections you had pooled to the dead one, connect to the standby. It's hard to avoid false failure detection if there's just a network burp or something minor. It may be critical to keep a whole cluster of clients pointed to the same service. I'd try not to write this myself.

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi

ben josh

Ranch Hand

Posts: 620

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

hi and thanks for the fast reply
i was thinking about less complicated scenario i need to support it programmatically
i know the ips of my backup servers so i could connect them when my main server is down.
the thing is what is the best solution / alghurithem to do so

Frank Carver

Sheriff

Posts: 7001

I like...

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

One of the main problems with remote calls and messaging (whatever the protocol) is that you usually don't know that a remote service has broken until you try something and it fails or times out. This is often too late to do anything sensible about it.

One "system integration pattern" which I have used several times is the "heartbeat". Add an extra call type to your system which has no business value (and thus can fail or delay without affecting the overall system), but serves solely to check if a destination service is available.

Then build your client software to send these heartbeat calls or messages on a regular basis. The exact frequency of the heartbeats depends on many factors such as expected load, failover time, service-level agreements, and so on. Then your system stands a fighting chance of spotting a broken/unavailable service by a timeout or connection error when senduing such a heartbeat. Your system can report the error for fixing and route the next "real" call/message to an alternative server.

Does that make sense?

Read about me at frankcarver.me ~ LinkedIn ~ Frank's PhD research

ben josh

Ranch Hand

Posts: 620

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

hello and thanks for the reply
i will look more dipper into the heartbeat pattern , is it better then the round robin pattern to this scenario?
does it not overhead to send extra rmi call every x time some kind of check for heartbeat?
thanks

Frank Carver

Sheriff

Posts: 7001

I like...

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

i will look more dipper into the heartbeat pattern , is it better then the round robin pattern to this scenario?

I can't say whether it would be better in your application. That depends on so many factors. The advantage of "heartbeat" is that it is largely preventative - potential business failures are tested ahead of when they are needed, so that a working service can be used when the next real call is needed. "round-robin" is at best reactive (if a request fails, pass it to the next "robin"), and can greatly increase the time taken to process such a failed call (normal processing time + timeout to wait for a no-answer).

does it not overhead to send extra rmi call every x time some kind of check for heartbeat?

Yes. there is an extra call needed for each heartbeat. If bandwidth use is more important (or expensive) than response time/availability then "heartbeat" may not be the best choice.

Read about me at frankcarver.me ~ LinkedIn ~ Frank's PhD research

Stan James

(instanceof Sidekick)

Posts: 8791

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Even the hardware solutions have trouble telling when a server is not just merely dead (not answering for the moment) but "really most sincerely dead." (Wizard of Oz was on TV the other night.) We call web services on machines that look lively enough to the load balancer but actually are hung up so they never respond. Somebody has to manually tell the load balancer when one of those is hung, then restart it, then tell the load balancer. Ugh.

You can run heartbeats or keep-alives either direction ... you can ping all the servers to see if they're still there or you can have all the servers ping you just to say "I'm here!". But if you're running hundreds or thousands of calls a second, you're going to find trouble through failed calls roughly heartbeat/2 seconds before the heartbeat tells you.

I read today about the "split brain" problem when two servers both think they are the active primary server when one should really be in standby mode. If you had some background processes it might be bad for both servers to run them at the same time.

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi

Jeevan Philip

Ranch Hand

Posts: 41

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

There is always a tradeoff when going for HA solutions. Ideally you should use some COTS products which provides build in fault detection and re-routing of remote calls.

But if you need to roll out something your own, then the best approach is to use heartbeats as already mentioned. Even this will not guarantee HA since the server can fail between last heartbeat call and the remote call. And if you descrease the heartbeat timeperiod, it will affect performance since it introducses chattiness.

Internally most HA systems (clusters, load balancers etc) also does similar thing. Simply coz there is no other way of finding if somebody is available without looking for him or he telling us!

The more important thing in this solution is to have a graceful way of recovering from connection failures and transparently retry with another server. For this, there are different approaches like round robin, server affinity, weight-based, context based etc.

Jeevan

ben josh

Ranch Hand

Posts: 620

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

can you please explain t the last 3 things :
server affinity, weight-based, context based
thanks