Thanks to anyone who can provide me some insight on how to fix this.
When I run the process described below on my dedicated Ubuntu server, all works as I expect. However, once I transferred to the Amazon EC2, the speeds got a lot faster but suddenly my code broke. Files that were writing out fine on the other machines I tested on don't write reliably on the EC2 server.
I created a load balance server which queues a client connection and gets a new socket started (in a new JVM instance) and waits for that socket to call back notifying that it is ready. Once ready, the new socket's endpoint is written out to the client, the client disconnects from the load balancer, connects to the new ServerSockt and starts streaming to it. When it is done, the socket calls back to the load balancer, the load balancer adds that endpoint back to the pool and the JVM associated with that endpoint closes.
For each item the client is uploading (and there are a series of artifacts of varying sizes), the above process repeats until I have a number of files in a directory (e.g. fileId.aextension, fileId.anotherextesion). Only one Thread (one JVM) can write to a file. However, several may be open writing to the same directory at the same time (or very, very close to the same time).
Once all artifacts for a particular topic are uploaded, the client may have more topics to upload and so the process above may be triggered to start anew as soon as the last artifact for the last topic has been received.
Here is the code that does the writing:
When it hits this part of the code, it works fine for a while but then eventually just stops writing files. While the JVM's open and close predictably throughout the uploads, eventually one will just not close (I suspect it has a file in its buffer but can't flush it). However, since it got all the bytes from the client, it tells the client to continue on. The client does so. A new ServerSocket then opens and it too won't close and at the end of that upload, it never responds to the client. Instead, it gets to the handler that causes the code above to run but then just goes silent. Once I kill the Java processes, the files write out.
No errors are throws because, I suspect, no errors are generated. It seems like this may be due to the speed of writes to the directory. I have tried putting the Threads to sleep to no avail. Does anyone have any thoughts or suggestions on how to make this work the same on EC2 as it does on every other computer I run it on?
I am not sure why this belongs in this forum rather than the one I started in. I have a problem with files suddenly stopping writing which I don't think has anything to do with the socket. The socket hanging is only because the write outs are not working and they only stop for the one client who is having write out problems. In fact, given the flow I describe, and the fact that I can see the socket keep talking AFTER the first write out hangs but then stop once it reaches the next write out of the next socket means that it is clearly hanging on file output. If anyone can help me with this, in any forum, I would be delighted. Thanks so much.
I am going to migrate to using an external script to get the server instances running. I will follow up here if simply having scripts start the JVM suddenly cures the problem (vs having Process start new JVM instances).