Seems to me you have two decisions to make:
1. How to tell when the text info stops and the image info starts.
2. How to transmit the image info.
For 1 - you could use a marker
string like multipart mail does or you could start your text with a character count.
For 2 - the big question is binary versus encoded binary - if your server side is sending a character stream, which involves character translation with a writer, you will have to encode the image - base64 is the most efficient. If your server side is sending a byte stream
you should be able to send the binary image directly.
Bill