I have been trying for a week now to correctly use a socket to read the stream of a response from a server than simply parse out the GZIP response body portion and than place those bytes into the GZIPInputStream and get back my results as a string. I'm able to get it to unzip for certain responses however others fail? I think I'm incorrectly removing bytes from the stream maybe when I dechunk the response. The response is sent in a chunked format so I need to remove those bytes and put back together in order for the gzip file to be correct. So here is what I do first let me give you the gzipped test response I have created:
Gzip Unit Test
The contents of gzip.php which send that content gzipped up
Here is the response and the parsed content I get from the server,after parsing the content of response out and after cleaning up erroneous chunk data out:
I'm assuming I must be stripping out an important byte somewhere when I'm cleaning up the bytes here is how I get those bytes above:
To strip out the headers and body from response I do the following:
To me that process looks to be getting the correct content as shown by the bytes above chunked response body looks to be the content of response in bytes to me. Know because this particular response is chunked I need to remove the erronous chunked headers insinde that response body:
I have a sneaking suspicion this is where i'm stripping out maybe a byte or leaving a byte that should not be a part of the gzipped data. After this I send those bytes to the GZIP deflater below:
So whats happening is for a response like the one below after gzip deflates I get the following:
String after deflating and excetion thrown:
But a simple change of the s in Lights in response to Lighta like the following response:
I get back correctly unzipped with no exceptions? Can anyone give me any ideas about what may be wrong? I tried taking that same response content and place inside a file and gzip it and it unzipps just fine. I even tried to compare the bytes of the gzips from sever vs file and they are completely different for some reason. Even when the server one correctly unzips so I can't use that as a way to compare the bytes to see which byte from server may be missing?
This was what the response looks like in bytes from file:
From File which decompresses correctly:
[31, -117, 8, 0, 0, 0, 0, 0, 0, 0, -77, -55, 40, -55, -51, -79, -29, -27, -78, -55, 72, 77, 76, 1, -47, -71, -87, 37, -119, 10, 25, 37, 37, 5, -70, -87, -123, -91, -103, 101, -74, 74, -50, -7, 121, 37, -87, 121, 37, -70, 33, -107, 5, -87, 74, 10, -55, 16, -98, -83, 82, 73, 106, 69, -119, 62, 72, -77, -75, -126, -77, -121, 99, 80, -80, 107, -120, 109, 105, 73, -102, -82, -123, 18, -56, -112, -110, -52, -110, -100, 84, 59, -49, -68, -92, -4, 10, 5, 93, 5, -1, -46, -110, -100, -4, -4, 108, -123, -16, -44, 36, 5, -57, -28, -28, -44, -30, 98, 5, -97, -52, -12, -116, -110, 98, 0, -127, 62, 118, 36, 125, 0, 0, 0]
As you can see the one from server and file look very different so I can't use them to compare. I'm open for any and all ideas on what to try next.