HttpServletResponse.getOutputStream() - streaming or not streaming?

Jan 18, 2012 13:01:42

Hi. When I get the ServletOutputStream for the response, and then try to write a large file to it, and read the file in chunks, into a buffer, and then sent that buffer to the output:

(please ignore closing streams, this code is just for demonstration)

The ServletOutputStream can at a certain point get 'committed', whatever that means. The question is - when are is the data sent to the client? Is it buffered somehow on the server, and then sent as a single http response with huge payload to the client, or something else?

Jan 19, 2012 00:37:00

I should have probably mentioned that I was meaning a response with Content-Type: application/octet-stream and Content-Disposition: attachment,file=...
Does this support streaming?

Regards,
Raf

Jan 19, 2012 09:09:06

The response will be sent in tcp/ip chunks as organized by the operating system.

The servlet output stream sits several levels above the TCP/IP communication which you don't have to worry about.

See various Wikipedia articles such as this one.

Bill

Jan 19, 2012 10:46:55

You could do an experiment where you have your servlet send an infinite amount of information to the client. Just write a loop which never ends and which continually sends some bytes, you don't care what bytes.

Then if you see your server crash with an out-of-memory exception, that tells you one thing. If you see those bytes arriving at your client, that tells you something else.

Jan 19, 2012 11:00:49

Hi. Yes, I know about the TCP/IP stack and that HTTP is application layer a few floors above that. The question was more: when does the HTTP server start writing to the link to the client, and when do the TCP packets get sent? Only after the whole response is ready somewhere (buffered on the server) or does it happen on the fly. It must be on the fly, as I don't believe downloads of big files (like whole Ubuntu distributions) are buffered on the server, but haven't found anything that would support that claim.
Yes, I know I can make a practical experiment, but I was more like searching for the theory (articles, tips, maybe some RFCs) behind the practice.
The reason why I ask is that we are using a funny UI library that was imposed on my team, and it is HTTP based, but it is not a webapp framework. You can store files on the client machine, but the whole data must be first prepared on the server, and I debugged it, and it is all buffered there! I asked support about it, and they said that HTTP doesn't support streaming anyways, so they can't do anything about it. I want to answer to that, but all I have are common sense and no links to back it up. The RFC where Content-Disposition is described didn't help either.

raf

Jan 19, 2012 11:21:17

There's nothing in the RFCs to say how clients or servers should buffer data. There's nothing to force a server to buffer the entire response before it starts sending it, either. (Unless it's trying to set that header, whose name I forget, which says how many bytes are in the response.) So you can't answer your question by reasoning about HTTP. You have to answer it by observing what actually happens. Or by examining your server's source code, if that's a possibility.

Jan 19, 2012 11:50:58

The thing is that I _don't_ want the server to buffer ;d But if you say there is nothing I can do to force it, there is probably also nothing that I can do to prevent it.
What does the venerable tomcat do in such case?

Jan 19, 2012 13:01:05

Raf Szczypiorski wrote:But if you say there is nothing I can do to force it, there is probably also nothing that I can do to prevent it.

I didn't say that. Or at least I didn't mean to say that. What I did mean to say was, there aren't any rules in HTTP which say that the server can or may or must buffer data, and I don't believe there are any such rules in the servlet specification either.

Jan 19, 2012 13:01:39

In the Tomcat 6 Connector docs here I find:

socketBuffer
The size (in bytes) of the buffer to be provided for socket output buffering. -1 can be specified to disable the use of a buffer. By default, a buffers of 9000 bytes will be used.

There is also

bufferSize
The size (in bytes) of the buffer to be provided for input streams created by this connector. By default, buffers of 2048 bytes will be provided.

Bill

Jan 19, 2012 15:06:01

What's your main concern here:? Are you concerned that your container is going to take too much memory? Or are you relying on data being sent to the client as soon as you start writing to the stream from the server?

If it's the latter, a word to the wise:- Do not rely on setting the buffer sizes on your container. If your TCP packets are jumping through gateways, the gateways might decide to repacket the information in the packets. TCP standard guarantees that packets will be delivered to the destination and they will be in the same order as sent from the source. It doesn't guarantee that some information will be sent in the same packet. Also, the buffers that control tomcat are the tomcat buffers. The OS has it's own buffers that it will use. So, there might be more buffering going on than you can mentally buffer.

It's ok if all you want to do is make sure tomcat uses the least amount of memory possible. However, anything beyond that you are asking to get buffered

Jan 20, 2012 08:08:07

My main concern is this: we use this library and we allow the users export some data in XML. The export data set might be small, or it might be huge - the actual configuration is done per 'export configuration'. So, it was all nice when the data amount was small, but now we have problems, as users export more and more data - and all of it is buffered on the server side. So, if the XML is 100mb worth of memory, 10 users exporting at the same time (which is not actually too many, and it is a viable scenario) take up 1gb, and it's all there sitting on the server side for a while. We do have 1gb to spare, but then again, we also have times when 100 users export... And we have problems with memory. So, I asked support about it and they say it is impossible for HTTP to stream, which I don't believe as there are a multitude of sites that allow huge download bundles (linux distros, rapidshare, whatever) and I don't believe they all buffer that before sending to the client - it somehow just doesn't make sense.
So. I just wanted to ask what actually happens when I do that with Content-Disposition and the rest of the story. I certainly don't care whether data will be sent in the same packet, or if some gateways perform buffering - not my problem. I just find the whole story the support guys tell me really hard to believe.

Jan 20, 2012 14:28:29

If a lot of your traffic is large file transfer, then the sysadmins will certainly have to tune the OS and the hardware for it. You cannot really expect a machine that is tuned for web access (which generally means lot of connections for small files, and more CPU and IO* usage than network) to automatically support large files going through the pipe. If your application has gone from being CPU/IO bound to be network bound, you will need to configure the underlying hardware and OS for it. I can certainly see the sysadmins balking at a change like this, especially if it's unexpected and are occurring in a live environment.

If your application is mixed-bound (I invented this term

); ie; some parts of it are CPU/IO bound, and some parts are network bound. you might want to think about hosting the parts that are network bound to servers that are optimized for file sharing.

*By IO in this post, I really mean Disk IO.

Jan 20, 2012 15:04:08

I don't think you understand what "streaming" means, so that when people tell you that HTTP doesn't support "streaming" -- which it doesn't -- you start assuming that HTTP doesn't allow the receiver to start receiving the response before the sender has finished sending it. Which is an incorrect assumption. So don't ask people about "streaming" if you really want to know about buffering policies.

Jan 21, 2012 02:30:49

Hi Paul. Yes, that's what I mean by streaming - the client receives while the server is not ready yet.
Please educate me if what I say is wrong - can you provide links which describe this scenario, and also describe what streaming is and what it is not, because I obviously don't get the difference.

raf

Jan 21, 2012 07:58:03

Wikipedia has a lovely summary and history of streaming.

Bill

Jan 22, 2012 11:44:00

So you are saying that streaming == media streaming? Or rather, that whenever anybody says 'streaming', everybody else understands it as 'media streaming'? That's an interesting point of view, but not the only correct one, I would say. Where I come from, streaming data means just allowing data processing (uploading, transformation, whatever) in a way not requiring to buffer all of it in memory first, as there can be a whole lot of it. I guess the inventors of StAX have a similar notion. Media streaming is just a use case.

To get things straight - no, I was not heaving media streaming in mind, but rather the ability of the server to start transmitting data of a huge file / dynamically created bytes before it has all been read / generated, thus reducing memory footprint on the server. Does Content-Disposition allow this, is there any RFC to define this?

Paul, I know you are a bartender here, so don't get me wrong, but I consider your post to be a) a little aggressive in tone and content; and b) not really contributing to the topic, as you gave no answers, just some implications of yours. Might be just my point of view, though.

raf

Jan 22, 2012 12:55:30

Raf Szczypiorski wrote:So you are saying that streaming == media streaming? Or rather, that whenever anybody says 'streaming', everybody else understands it as 'media streaming'?

Well, look. You're the one who used the word. And you're the one who said that various unnamed persons had told you that HTTP doesn't support "streaming". Perhaps you didn't inquire too much about what they understood when you used the word, but to me it looks like they interpreted it as meaning "media streaming". Asking what "anybody" and "everybody" think the word means isn't going to get you anywhere, because as you can see from this discussion, people can mean different things.

So really that question isn't getting you anywhere. And I find it frustrating to observe this thread: you asked how a particular product worked and I said "Try it and see what happens". But you didn't. Instead, unfortunately the thread has gone way off course into an unproductive discussion about the meanings of words (and I'm sorry that I helped it go that way). So to bring it back on course, let me repeat my suggestion to do the experiment. Seems to me that your original question could be answered by running some suitable demonstration code.

Jan 22, 2012 13:34:41

I explained what I mean to the unnamed persons, and they got what I meant by it. What I got as answer was that such data has to be all read / generated before, and sent in a single HTTP response, which doesn't seem right, because...
I did my experiment on Tomcat 7 and I managed to generate a few gigs of data and stream / send it to the client, although the server had only a few hundred mbs of memory. Also, sending such data as response payload requires the Content-Length to be set, which pretty much requires the whole data to be generated to be able to count the bytes.
But you are wrong saying that this simple test can prove anything - it will just prove that one of many servers (tomcat 7) is smart, for example, and it still doesn't tell me what the standard behavior is and what it is not. I was looking for some maybe general answer about how http servers must behave when data is sent with Content-Disposition: attachment. If there is no such rule that servers must obey, then be it, but I am not able to answer it on my own. Hence the question, to learn from more experienced people who might have the answer.

raf

Be reasonable. You can't destroy everything. Where would you sit? How would you read a tiny ad?

a bit of art, as a gift, that will fit in a stocking

https://gardener-gift.com