Thanks Ulf, Tim,
As far as the term "compression level" I was referring to the "DeflateCompressionLevel" directive in Apache's configuration file (see
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html#deflatecompressionlevel). Tim is correct it is more a how hard to search for an optimal compression algorithm than some tunable parameter for a specific algorithm.
I agree HTTP is not the optimal protocol for large file transfers, however it does have a few redeeming qualities:
It is (almost) never blocked by firewalls and this transfer (if it can be fast enough) would be done from hotels and guest accounts on a University campus (which block way too many ports).RESTful queries are well supported in most languageIt's the easiest one to use from a servlet
I did 3 tests that I think show the problem is either in the servlet itself or in the Tomcat to Apache connection and not in redundant compression in the networking hardware:
Sending uncompressed data results in transfer times that are very close to file size / network speed, so I doubt there is any hardware compression available. Saving the uncompressed data to a file, and posting it on the server as a static file results in transfer times close to the expected compression factorUsing scp with and without compression results in transfer times close to the Apache only times with and without compression
To step back further in my problem domain, my application can be viewed a web front-end to an application called the Network Data Server (NDS). NDS implements a proprietary network protocol and is itself a front-end to a proprietary file format that stores real time data from Gravitational Wave observatories. NB: I'm not going explain GW observatories but if your interested, the public web site is
http://www.ligo.org.
There are currently 6 NDS servers available to my application. The database tables we are discussing here are read-mostly. Currently once a day, in the middle of the night, we check each of the servers to see if the meta-data (called the channel list) has changed, if so we download the full channel list and update these tables. Here's the rub: the channel list is huge and growing and NDS does not compress it. People use NDS directly from computing clusters, desktops and laptops. They currently download the full uncompressed list and as usually users spend their waiting time complaining to developers about it.
There are other alternatives including compressing the list by NDS itself. I thought a servlet was an easy alternative but unless I can get better performance out it, it won't be accepted.
In my mind the solution is to get people to do their queries against the central database and not maintain local copies of part of it. We've been debating this for a long time and positions have hardened. Some people demand an efficient bulk transfer of all the meta-data from a single server.
Right now my biggest problem is that I don't understand where the bottleneck is. The servlet that provides a RESTful search is working and the bulk transfer is less important.
I will continue to search for the reason this approach doesn't work. If I figure anything out, I'll report back.
If you think of anything please let me know.
Joe