We have build a webservice and wanted to know what to do when webservcie goes down due to server issues or any other region.
We don't want to miss any data that client might be trying to push into webservice during an outage. How to grab hold of that data and push into webservice once its up without any manual intervention?
Well, if you put something in front of the webservice that grabs the request and saves it, what happens when that thing goes down. When you are talking about fault tolerance, you have to worry about every system going down. You cannot eliminate a Point of failure by putting another Point of failure in front of it
The standard way of doing this is to build redundancy. Instead of deploying a Web service on one machine, deploy it on 2 machines with a load balancer in front. So, if one machine goes down, the traffic is routed to the second one. DOes your web service talk to the a database? Then you need to worry about the database going down. You either need to invest in something that provides fault tolerance in the database, or you need a backup database with replication between your databases. Well, what if the load balancer goes down? You need 2 load balancer, with something that switches the DNS to the other load balancer.
This is one way of building fault tolerance. THere can be other ways too. The main thing is that you have to think about removing points of failure
posted 6 years ago
But is there any way to introduce Message queues in between which would absorb failed requests and post them to webservice once the application servers are up?
Well, yes, but you need something in front of the message queue to receive the requests. I am assuming that you still want to provide a web service that the client can call. So, you can have a web service in the front that feeds a message queue. The message queue feeds listeners that do the actual work. SO, you have moved the work to a node that is more fault tolerant. There are 2 problems with this
a) What happens when the web service that feeds the queue goes down? You haven;t solved the problem with this architecture. You have just moved it. You still need to make this web service redundant
b) How do you handle failures on the back end? Let' s say the listener cannot process the message. How will it return the error message back?
Or you could ask your client to directly put the messages in the queue. In which case why use web services?
Also, you also need to think about making your message queue fault tolerant. This means you need multiple message queues
If you're willing to accept messages when you know that you can't act on them, that means the service is asynchronous in nature. So you can consider asynchronous architectures. First one that comes to mind is email. Seriously. Have the client send an email with an agreed format to an address of yours. If your service is up, it retrieves those emails and does whatever needs doing. If it's not up, emails simply pile up and wait for servicing. if your email server goes down - no big deal, the sending server will continue to try to deliver it, and even keep the sender informed. It usually tries for up to 72 hours, and will again inform the sender if it ultimately fails. Of course, you have to ensure that no spam filtering occurs, but between two sides known to one another that is easily achieved.
If you are considering non web service architectures, you might want to look at Storm . It can integrate with different queuing systems. It gives you fault tolerance and scalability right out of the box. Twitter acquired it and open sourced it
I like Ulf's idea of email as queue . It's a cheap way of building fault tolerance. But, you have to use a very "loose" definition of fault tolerance. You are "outsourcing" the problem to the email infrastructure. However, remember that email doesn't guarantee fault tolerance.Emails goes through hops, and if any of the hops fail, the user will get an error message. Usually, email servers are designed for fault tolerance, but they don't guarantee you fault tolerance. If you have an SLA of 99.99%, you shouldn't rely on someone else to provide that SLA for you.
posted 6 years ago
However, remember that email doesn't guarantee fault tolerance.
Correct, but it does adhere to the TCP contract: Either the message is delivered, or the sender receives an error. That may or may not be acceptable in this situation.
Yes, for a very loose definition of fault tolerance, an email based system will work. If your system is guaranteeing 80% SLA, then yeah you can use email, because generally email's SLA is much higher. OTH, if your SLA is much higher than your email provider's you cannot rely on them. It's a weak link in the chain.
Adherence to TCP contract has nothing to do with adherence to SLA. WHen you say your SLA is 99%, it means the system will function 99% of the time. It doesn;t mean it will work 90% of the time, and send back error messages 9% of the time
We don't have time for this. We've gotta save the moon! Or check this out: