When a web server and a browser communicate over HTTP, e.g. a client requests index.html from the server - how is the response actually formed?
Does the server blurt out a bunch of text through the TCP/IP pipeline in the form of HTML tags so that the receiving browser takes that stream of Bytes and converts it to HTML?
My concern is, is there an actual HTML file being transported, in FTP-style, or are Byte streams being sent back and forth and the result is interpreted and rendered on the fly?
That's something about the client server story I didn't fully understand.
SHort answer is: Byte streams are flying to and from between client and browser. However, the servers might be caching the data as files. There is nothing in the spec that says that the servers (or gateways) cannot save the data in files
Long answer: I am not sure whether you are familiar with layered architecture of network protocols. If not, the rest of this might not make any sense.
SO, at any given layer within the stack, the layer provides some functionality that is useful for the layer above and not provided by the layer below. Each layer wraps the data provided by the layer above into it's own format. Usually, the data that the layer transmits between client and server contains a header, the data provided by the layer above, and a footer. Each layer is responsible for marshalling and unmarshalling it's own data.
Think of HTTP as just another layer. The user hits a button on the browser. The browser opens a TCP/IP connection to the server, and writes the data to the Output stream of the connection in a certain format. The format of the data depends on whether the request is a GET/PUT/POST. The HTTP specification spells out exactly what the format of the data should be. Internally, the OS takes the data from the OutputStream, marshals it to the TCP packet and then gives it to IP layer. THe IP layer marshals data into another packet and gives it to the next layer, and so it goes down the networking layers. The data is marshalled many times, broken into packets, sent to the server. THe server recieves the packets, and starts unmarshaling the data layer by layer. Ultimately, the data reaches the TCP layer on the Server. The server knows that the Web container is listening on a certain port, and gies the newly reconstructed HTTP request to the Web container. The request at the minimum contains a) the HTTP method, and b) the URI of the resource. The web container looks in the webapps deployed to see which webapp maps to the URI. Once it finds it, it looks at the web.xml to find which servlet is mapped to which URI. Once it finds the servlet, it parses the HTTP request, and constructs a HTTP request object. It constructs an object of the servlet (if it hasn;t already) and calls the service method
The servlet does it work and writes data to the output stream of the response. The servlet engine just passes that data to the TCP layer. THe whole process reverses at this point. THe TCP layer wraps the data written to the output streap into a TCP packet, then forwards it to the next layer. The data goes down the layers, gets transmitted to the client, and the client reconstructs it back and gives it to the browser. The browser reads the data, and looks at the headers and says "Aha! this is an HTML response. So, I shall parse the data in the body of this TCP packet as HTML file and render it on the UI"
Hope this makes sense. THink of the browser and the Web container as layers on top of the network stack. The browser is responsible for translating input into an HTTP request; the web container is responsible for converting the requiest into a HTTPRequest object and calling the servlet. The servlet sends back a response, and the servlet container is responsible for sending that data back to the browser. The browser is responsible for parsing that output and displaying the data to the user
Internally, whether the layers store data in files ot just keep data in memory is of no concern to the layers above. They might or might not. As far as the top most layers are concerned everything is a stream of data.