I have developed a comprehesive Election Manager app that I plan to open source. Substantially everything is working great, except I got kicked in the gonads during final testing, apparently by a bug in the embedded HTTP server I am using.
I fell into using com.sun.net.httpserver since it is (loosely) included in the JDK and it was pretty simple to get working. By design, my swing clients use only the http post command to initiate each contact with the server. Clients send XML representations of objects to the server and the server returns XML representations of other object(s). During development, clients were run mostly on the same host as the server; never a problem. Of course, during final testing, clients are run on other hosts connecting over the network; never a problem as long as the strings being posted to the server are less than 1460 characters in length. However, if the client is connecting over the network from a different host AND the XML String is longer than 1460 characters, the HTTP server hiccups every couple of hundred posts or so by truncating the XML to its first 1460 characters. The characters in the String do not seem to matter, only its length. A test posting the same 3200-character (or other length) String repeatedly will work fine hundreds of times, then truncate, then work fine some more before truncating again. It smells like a hard-to find thread/timing bug.
I am hoping someone will have some good advice as to the best way out of this. I prefer to spend my time developing election software rather than standard parts that are supposed to work. It appears that com.sun.new.httpserver is not widely used. I'm looking at swapping it out for jetty, but that involves another learning curve at the very least.
1. Any general advice regarding this problem?
2. Is jetty the best choice to replace com.sun.net.httpserver?
3. Is there any sample code for an embedded jetty (or some better option) receiving posted strings and returning string responses?
4. Client-side sample code for posting strings to jetty and receiving string responses?
I will hugely appreciate any help that anyone can offer! TIA.
Each client communicates with the server through an instance of the below ServerPortal class. (It was pretty succinct before I added all the try/catch blocks to see the details of what goes on.) It appears to me that the trouble starts when the handler on the server gets a truncated string. In testing, I log the truncation, but do not stop execution of the handler; usually, it returns a response and there is no exception thrown on the client. However, for some truncations, the server apparently kills the handler's thread before it can return a response. In those cases, ServerPortal will get a java.net.SocketException "Unexpected end of file from server" when it attempts to execute line 88 of the below code.
The URL string is (originally) supplied by the user. It probably has always started with "http:" for our testing. If this forces a get, it appears that the server can handle it with long strings 99.9% of the time, so not being able to handle them once in a while doesn't sound right. It's no trouble to test without the leading "http:" so I will surely do that. I do want to use post (not get), however, so I'll have to parse for "http:" and remove if present. Will that guarantee that post will be used?
The early "out.close()" seems much more likely to be the problem. As you recommend, I will replace it with a flush and do the close at the end. I'll post the results of these tests ASAP (probably tomorrow).
Flushing the OutputStreamWriter and moving out.close() to the end of the method made no difference. After 578 error free round trips to the server, the 579th was truncated, then some more successful round trips, then truncation again on the 698th iteration.
The above code declares connection as URLConnection, so it will need to be cast to HttpURLConnection before setRequestMethod("POST") can be invoked. To eliminate the posibility of throwing an exception on the cast, I should verify the correct protocol when the URL is instantiated. Then we'll see if using post instead of get makes any difference...
Sadly, using http post does not appear to fix the problem. A test run got through 298 error free round trips to the server before hiccupping with a truncation on the 299th iteration. Posted below is the current ServerPortal class code that was used for this test. No doubt, it is much improved over the Sun sample code I used for the past two years (now uses post as I originally intended), but the basic problem seems to remain the same. Any other recommendations regarding the client side?
Since the problem still appears to me to be in the embedded com.sun.net.httpserver, I would greatly appreciate any guidance that someone might have regarding these questions:
1. Any other general advice on this problem?
2. Is jetty the best choice to replace com.sun.net.httpserver?
3. Is there any sample code for an embedded jetty (or other better option) receiving posted strings and returning string responses?
In case it may be of help to someone else, here is how I "resolved" this problem:
I ripped out the com.sun.net.httpserver and embedded Eclipse jetty instead. Client-side code remains substantially as posted above. Jetty was about 10 times faster! Unfortunately, the underlying problem remained the same. So, it's pretty hard to hang it on the http server. My working hypothesis now is that there most likely is an obscure threads/timing problem somewhere in nio that occasionally truncates strings longer than 1460 to 1460. Such truncations appear to occur randomly every several hundred posts and no exceptions are thrown.
My work-around is to detect the problem on the server and return a response that requests a repost. The above client-side code now has a repost loop that will repost the request up to 3 times before declaring a fatal error and shutting the client down. During tests runs of 10,000 posts to the server, typically 15 to 20 truncations occur. Of the 17 or so truncations, all but one are typically corrected with a single repost. All truncations not corrected by a single pepost have (so far) always been corrected by a second repost. Thus, the plumbing handles the hiccups and renders them invisible to my application layers.
I feel just a little guilty not pursuing the correct fix for this, but it would likely be very time consuming. I can't afford that as my project has already sustained a pretty big unanticipated delay. If someone else wishes to work on this, I'll be glad to assist to the extent that I can. Otherwise, the moderator can mark this thread resolved.
Nah, it smells like software to me. A packet getting lost now and then would not seem to explain the "magic" 1460-character number. Post (or get) bodies less than that never have a problem and, when the problem occurs, the truncation is always to the first 1460 characters. It would be my guess that, somewhere down in nio, there (effectively) is a 1460-character buffer. If your baloney fits, you don't have a problem. If you have too much baloney and the client and server are on the same host, the the threading works because transport is very fast and you are still OK. But if you have a lot of baloney and transport is slower over a real network, the timing occasionally gets messed up and your baloney gets sliced down to just the first buffer load and the rest of your baloney gets lost. This may or may not be exactly correct, but it's the only thing I can imagine that matches the symptoms.