Currently we have a stand alone Java application that can be deployed in multiple machines and run as servers.
Lets assume I have S1 and S2 deployed in 2 machines.
I am planning to design another application called 'Controller'
Controller will have a registry - that holds information of all the servers that are ready to run.
Controller will arbitarily select one of the server in the registry and instruct it to run.
Controller will send regular heartbeats to the server that is currently running, if no
heartbeat is received within the timeout period, controller assumes that S1 is #down#
It updates its registry indicating that S1 is down and instructs S2 to start running.
My question is what parameters/factors define if a server is #down#
- The heartbeat not being received by controller may be because of a n/w issue
i.e. serer could be just running fine but controller may not have received the heartbeat,
so how do I define the rules for a server-failover strategy.
- What happens if the controller itself is down?
If this is not the correct forum to post this question, can someone direct me to the right forum.